From pramchan at yahoo.com Sat Aug 1 02:43:36 2020 From: pramchan at yahoo.com (prakash RAMCHANDRAN) Date: Sat, 1 Aug 2020 02:43:36 +0000 (UTC) Subject: [Openstack-Interop] Friday 31 - Updates and Questions for future of Interop? References: <424140393.10386729.1596249816702.ref@mail.yahoo.com> Message-ID: <424140393.10386729.1596249816702@mail.yahoo.com> Hi all, We looked at OSF and Open-Infra initiatives and for sustaining and enhancing the Interop activity need your support. The current meeting notes are available on the site and will be meeting alternate Fridays Aug 14, 28,... 10 AM PST / 17 UTC - going forward on meetpad: Lnink: https://meetpad.opendev.org/Interop-WG-weekly-meeting Details in therpadhttps://etherpad.opendev.org/p/interop Do you have suggestions for Branding efforts? Few Question: 1. Interop WG being an OSF or Board Driven, how can Interop work with Projects to ensure Branding Integrated Logo Programs and other Cloud Providers can leverage OpenStack Logo Program? a. We do  have role to guide & reviewing refstack reports for Branding. We have decided to decouple Marketplace  & Branding programs.Seeking feedback from community on proposed add-on programs for Bare metal(Ironic, MaaS,...) & "Kubernetes-ready OpenStack"  b. Refstack hosting / Refstack-client are currently not maintained due to lack of volunteers to support. (seeking volunteers for updates - please reach out to @gmann in TC. c. Await user & operator survey annual reports, but if you have any Branding innovation like to propose or contribute please brainstorm your ideas here and bring it to midweek community meetings next week.  ThanksFor Interop WGPrakash -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Sat Aug 1 21:36:37 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Sat, 1 Aug 2020 14:36:37 -0700 Subject: Two subnets under same network context. Message-ID: Hi, I have observed that one can create two subnets under the same network scope. See below an example of the use case. [image: Screen Shot 2020-08-01 at 2.22.15 PM.png] Upon checking the data structures, I saw that the segment type (vlan) and segment id (55) is associated with the "network" object and not with the "subnet" (I was under impression that the segment type (vlan) and segment id (55) would be allocated to the "subnet"). When I create the VM instances, they always pick the IP address from the SUBNET1-2 IP range. If the segment (vlan 55) is associated with "network" then what is the reason two "subnets" are allowed under it? Does it mean that VM instances from both these subnets would be configured under the same VLAN? /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-01 at 2.22.15 PM.png Type: image/png Size: 65802 bytes Desc: not available URL: From skaplons at redhat.com Sun Aug 2 11:27:58 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sun, 2 Aug 2020 13:27:58 +0200 Subject: Two subnets under same network context. In-Reply-To: References: Message-ID: <4991a915-210a-ad24-8497-fab029c1a050@redhat.com> Hi, This is "normal". You can have many subnets (both IPv4 and IPv6 in the one network). By default Neutron will associate to the port IP address only from one subnet of one type (IPv4/IPv6) but You can change it and tell Neutron to allocate for the port IP adresses from more than one subnet. If You have both IPv4 and IPv6 subnets in the network, Neutron will by default allocate one IPv4 and one IPv6 to each port. But again, You can manually tell Neutron to use e.g. only IPv6 address for specific port. Please check [1] and [2] for more details. [1] https://docs.openstack.org/neutron/latest/admin/intro-os-networking.html [2] https://docs.openstack.org/api-ref/network/v2/ W dniu 01.08.2020 o 23:36, Anil Jangam pisze: > Hi, > > I have observed that one can create two subnets under the same network > scope. See below an example of the use case. > > [image: Screen Shot 2020-08-01 at 2.22.15 PM.png] > Upon checking the data structures, I saw that the segment type (vlan) and > segment id (55) is associated with the "network" object and not with the > "subnet" (I was under impression that the segment type (vlan) and segment > id (55) would be allocated to the "subnet"). > > When I create the VM instances, they always pick the IP address from the > SUBNET1-2 IP range. If the segment (vlan 55) is associated with "network" > then what is the reason two "subnets" are allowed under it? > > Does it mean that VM instances from both these subnets would be configured > under the same VLAN? > > /anil. > -- Slawek Kaplonski Principal software engineer Red Hat From romain.chanu at univ-lyon1.fr Sun Aug 2 10:27:12 2020 From: romain.chanu at univ-lyon1.fr (CHANU ROMAIN) Date: Sun, 2 Aug 2020 10:27:12 +0000 Subject: Two subnets under same network context. In-Reply-To: References: Message-ID: <1596364031962.69223@univ-lyon1.fr> Hello, Network object is an isolation layer, it's defined by the cloud administrator: isolation type (VLAN, VXLAN...), physical NIC.. The subnet is a free value to cloud users, this mechanism allows multiples users to use same L3 networks (overlapping). So the network is used by admin to isolate the client and subnet is used by client to "isolate" his instances (webfont / db ...). Isolation works only on layer3 because all subnets will use the same layer2 (defined by admin). It's very easy to verify: boot one instance on each subnet then capture the traffic: you will see ARP trames. I dont know why Neutron drains all IP from last network but anyway the best practice is to create port then allocate to instance. Does it mean that VM instances from both these subnets would be configured under the same VLAN? > yes Best regards, Romain ________________________________ From: Anil Jangam Sent: Saturday, August 1, 2020 11:36 PM To: openstack-discuss Subject: Two subnets under same network context. Hi, I have observed that one can create two subnets under the same network scope. See below an example of the use case. [Screen Shot 2020-08-01 at 2.22.15 PM.png] Upon checking the data structures, I saw that the segment type (vlan) and segment id (55) is associated with the "network" object and not with the "subnet" (I was under impression that the segment type (vlan) and segment id (55) would be allocated to the "subnet"). When I create the VM instances, they always pick the IP address from the SUBNET1-2 IP range. If the segment (vlan 55) is associated with "network" then what is the reason two "subnets" are allowed under it? Does it mean that VM instances from both these subnets would be configured under the same VLAN? /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-01 at 2.22.15 PM.png Type: image/png Size: 65802 bytes Desc: Screen Shot 2020-08-01 at 2.22.15 PM.png URL: From gmann at ghanshyammann.com Mon Aug 3 00:10:36 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 02 Aug 2020 19:10:36 -0500 Subject: [all][tc][goals] Migrate CI/CD jobs to new Ubuntu LTS Focal: Week R-11 Update Message-ID: <173b1a7e32b.adf0e67a81133.1529455632674185621@ghanshyammann.com> Hello Everyone, Please find the week R-11 updates on 'Ubuntu Focal migration' community goal. Tracking: https://storyboard.openstack.org/#!/story/2007865 Progress: ======= * We passed the first deadline which I planned initially but looking at the failure happing it will definitely will take more time. My first goal is "zero downtime in gate", so if we finish it little late (with all repos tested) is ok. * ~80 repos gate have been tested/fixed till now. ** https://review.opendev.org/#/q/topic:migrate-to-focal+(status:abandoned+OR+status:merged) * 115 repos are under test and failing. Debugging and fixing are in progress (If you would like to help, please check your project repos if I am late to fix them): ** https://review.opendev.org/#/q/topic:migrate-to-focal+status:open * Patches ready to merge: ** https://review.opendev.org/#/q/topic:migrate-to-focal+status:open+label%3AVerified%3E%3D1%2Czuul+NOT+label%3AWorkflow%3C%3D-1 Bugs Report: ========== Summary: Total 4 (1 fixed, 3 in-progress). 1. Bug#1882521. (IN-PROGRESS) There is open bug for nova/cinder where three tempest tests are failing for volume detach operation. There is no clear root cause found yet -https://bugs.launchpad.net/cinder/+bug/1882521 We have skipped the tests in tempest base patch to proceed with the other projects testing but this is blocking things for the migration. 2. We encountered the nodeset name conflict with x/tobiko. (FIXED) nodeset conflict is resolved now and devstack provides all focal nodes now. 3. Bug#1886296. (IN-PROGRESS) pyflakes till 2.1.0 is not compatible with python 3.8 which is the default python version on ubuntu focal[1]. With pep8 job running on focal faces the issue and fail. We need to bump the pyflakes to 2.1.1 as min version to run pep8 jobs on py3.8. As of now, many projects are using old hacking version so I am explicitly adding pyflakes>=2.1.1 on the project side[2] but for the long term easy maintenance, I am doing it in 'hacking' requirements.txt[3] nd will release a new hacking version. After that project can move to new hacking and do not need to maintain pyflakes version compatibility. 4. Bug#1886298. (IN-PROGRESS) 'Markupsafe' 1.0 is not compatible with the latest version of setuptools[4], We need to bump the lower-constraint for Markupsafe to 1.1.1 to make it work. There are a few more issues[5] with lower-constraint jobs which I am debugging. What work to be done on the project side: ================================ This goal is more of testing the jobs on focal and fixing bugs if any otherwise migrate jobs by switching the nodeset to focal node sets defined in devstack. 1. Start a patch in your repo by making depends-on on either of below: devstack base patch if you are using only devstack base jobs not tempest: Depends-on: https://review.opendev.org/#/c/731207/ OR tempest base patch if you are using the tempest base job (like devstack-tempest): Depends-on: https://review.opendev.org/#/c/734700/ Both have depends-on on the series where I am moving unit/functional/doc/cover/nodejs tox jobs to focal. So you can test the complete gate jobs(unit/functional/doc/integration) together. This and its base patches - https://review.opendev.org/#/c/738328/ Example: https://review.opendev.org/#/c/738126/ 2. If none of your project jobs override the nodeset then above patch will be testing patch(do not merge) otherwise change the nodeset to focal. Example: https://review.opendev.org/#/c/737370/ 3. If the jobs are defined in branchless repo and override the nodeset then you need to override the branches variant to adjust the nodeset so that those jobs run on Focal on victoria onwards only. If no nodeset is overridden then devstack being branched and stable base job using bionic/xenial will take care of this. Example: https://review.opendev.org/#/c/744056/2 4. If no updates need you can abandon the testing patch (https://review.opendev.org/#/c/744341/). If it need updates then modify the same patch with proper commit msg, once it pass the gate then remove the Depends-On so that you can merge your patch before base jobs are switched to focal. This way we make sure no gate downtime in this migration. Example: https://review.opendev.org/#/c/744056/1..2//COMMIT_MSG Once we finish the testing on projects side and no failure then we will merge the devstack and tempest base patches. Important things to note: =================== * Do not forgot to add the story and task link to your patch so that we can track it smoothly. * Use gerrit topic 'migrate-to-focal' * Do not backport any of the patches. References: ========= Goal doc: https://governance.openstack.org/tc/goals/selected/victoria/migrate-ci-cd-jobs-to-ubuntu-focal.html Storyboard tracking: https://storyboard.openstack.org/#!/story/2007865 [1] https://github.com/PyCQA/pyflakes/issues/367 [2] https://review.opendev.org/#/c/739315/ [3] https://review.opendev.org/#/c/739334/ [4] https://github.com/pallets/markupsafe/issues/116 [5] https://zuul.opendev.org/t/openstack/build/7ecd9cf100194bc99b3b70fa1e6de032 -gmann From zhangbailin at inspur.com Mon Aug 3 00:10:59 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Mon, 3 Aug 2020 00:10:59 +0000 Subject: =?utf-8?B?562U5aSNOiBbbGlzdHMub3BlbnN0YWNrLm9yZ+S7o+WPkV1SZTogW0dsYW5j?= =?utf-8?Q?e]_Proposing_Dan_Smith_for_glance_core?= In-Reply-To: <8635120d-11d6-136e-2581-40d3d451d1aa@gmail.com> References: <8635120d-11d6-136e-2581-40d3d451d1aa@gmail.com> Message-ID: <03ece5d405c74b2d9292301c2e3be7b8@inspur.com> +1 发件人: Jay Bryant [mailto:jungleboyj at gmail.com] 发送时间: 2020年7月31日 23:39 收件人: openstack-discuss at lists.openstack.org 主题: [lists.openstack.org代发]Re: [Glance] Proposing Dan Smith for glance core On 7/31/2020 8:10 AM, Sean McGinnis wrote: On 7/30/20 10:25 AM, Abhishek Kekane wrote: Hi All, I'd like to propose adding Dan Smith to the glance core group. Dan Smith has contributed to stabilize image import workflow as well as multiple stores of glance. He is also contributing in tempest and nova to set up CI/tempest jobs around image import and multiple stores. Being involved on the mailing-list and IRC channels, Dan is always helpful to the community and here to help. Please respond with +1/-1 until 03rd August, 2020 1400 UTC. Cheers, Abhishek +1 Not a Glance core but definitely +1 from me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Mon Aug 3 07:05:58 2020 From: zigo at debian.org (Thomas Goirand) Date: Mon, 3 Aug 2020 09:05:58 +0200 Subject: Two subnets under same network context. In-Reply-To: <1596364031962.69223@univ-lyon1.fr> References: <1596364031962.69223@univ-lyon1.fr> Message-ID: <01345dce-deb8-de63-4a01-bf86c6ac893e@debian.org> On 8/2/20 12:27 PM, CHANU ROMAIN wrote: > Hello, > > > Network object is an isolation layer, it's defined by the cloud > administrator: isolation type (VLAN, VXLAN...), physical NIC.. The > subnet is a free value to cloud users, this mechanism allows multiples > users to use same L3 networks (overlapping). So the network is used by > admin to isolate the client and subnet is used by client to "isolate" > his instances (webfont  / db ...). No, this isn't the way it works. Thomas From zigo at debian.org Mon Aug 3 07:14:19 2020 From: zigo at debian.org (Thomas Goirand) Date: Mon, 3 Aug 2020 09:14:19 +0200 Subject: Two subnets under same network context. In-Reply-To: References: Message-ID: On 8/1/20 11:36 PM, Anil Jangam wrote: > Hi,  > > I have observed that one can create two subnets under the same network > scope. See below an example of the use case.  > > Screen Shot 2020-08-01 at 2.22.15 PM.png > Upon checking the data structures, I saw that the segment type (vlan) > and segment id (55) is associated with the "network" object and not with > the "subnet" (I was under impression that the segment type (vlan) and > segment id (55) would be allocated to the "subnet").  > > When I create the VM instances, they always pick the IP address from the > SUBNET1-2 IP range. If the segment (vlan 55) is associated with > "network" then what is the reason two "subnets" are allowed under it?  > > Does it mean that VM instances from both these subnets would be > configured under the same VLAN?  > > /anil. Hi, If you want to use segments, with a different address range depending on where a compute is physically located (for example, a rack...), then you should first set a different name for the physical network of your nodes. This is done by tweaking these: [ml2_type_flat] flat_networks = rack-number-1 [ml2_type_vlan] network_vlan_ranges = rack-number-1 Then you can: 1/ create a network scope 2/ create a network using that scope, a vlan and "--provider-physical-network rack-number-1" and --provider-segment 3/ create a subnet pool using the network scope created above 4/ create a subnet attached to the subnet pool and network segment Then you can create more network segment + subnet couples addressing different location. Once you're done, VMs will get a different range depending on the rack they are in. Cheers, Thomas Goirand (zigo) From ignaziocassano at gmail.com Mon Aug 3 07:53:37 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 09:53:37 +0200 Subject: [openstack][stein][manila-ui] error Message-ID: Hello, I installed manila on openstack stein and it works by command line mat the manila ui does not work and in httpd error log I read: [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR django.request Internal Server Error: /dashboard/project/shares/ [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback (most recent call last): [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = get_response(request) [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = self.process_exception_by_middleware(e, request) [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = wrapped_callback(request, *callback_args, **callback_kwargs) [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return view_func(request, *args, **kwargs) [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return view_func(request, *args, **kwargs) [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return view_func(request, *args, **kwargs) [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return view_func(request, *args, **kwargs) [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return view_func(request, *args, **kwargs) [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, in view [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return self.dispatch(request, *args, **kwargs) [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, in dispatch [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return handler(request, *args, **kwargs) [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = self.construct_tables() [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in construct_tables [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = self.handle_table(table) [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in handle_table [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = self._get_data_dict() [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in _get_data_dict [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] data.extend(func()) [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in wrapped [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = cache[key] = func(*args, **kwargs) [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", line 57, in get_shares_data [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets = manila.share_network_list(self.request) [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in share_network_list [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return manilaclient(request).share_networks.list(detailed=detailed, [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] AttributeError: 'NoneType' object has no attribute 'share_networks' Please, anyone could help ? Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Mon Aug 3 08:13:31 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Mon, 3 Aug 2020 10:13:31 +0200 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? Message-ID: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> There is a trend of writing action plugins, see [0], for simple things, like just calling a module in a loop. I'm not sure that is the direction TripleO should go. If ansible is inefficient in this sort of tasks without custom python code written, we should fix ansible. Otherwise, what is the ultimate goal of that trend? Is that having only action plugins in roles and playbooks? Please kindly asking the community to stop that, make a step back and reiterate with the taken approach. Thank you. [0] https://review.opendev.org/716108 -- Best regards, Bogdan Dobrelya, Irc #bogdando From thierry at openstack.org Mon Aug 3 09:56:16 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 3 Aug 2020 11:56:16 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: <274f7289-136a-e829-bdf5-1c819355ce77@openstack.org> Sean McGinnis wrote: > Posting here to raise awareness, and start discussion about next steps. > > It appears there is no one working on Cloudkitty anymore. No patches > have been merged for several months now, including simple bot proposed > patches. It would appear no one is maintaining this project anymore. > [...] Thanks for raising this, Sean. I reached out to the maintainers at Objectif Libre to check on their status. Maybe it's just a COVID19 + summer vacancy situation... Let's see what they say. -- Thierry Carrez (ttx) From thierry at openstack.org Mon Aug 3 10:15:06 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 3 Aug 2020 12:15:06 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Message-ID: <88c24f3a-7d29-aa39-ed12-803279cc90c1@openstack.org> Ken Giusti wrote: > On Mon, Jul 27, 2020 at 1:18 PM Dan Smith > wrote: >> The primary concern was about something other than nova sitting on our >> bus making calls to our internal services. I imagine that the proposal >> to bake it into oslo.messaging is for the same purpose, and I'd probably >> have the same concern. At the time I think we agreed that if we were >> going to support direct-to-service health checks, they should be teensy >> HTTP servers with oslo healthchecks middleware. Further loading down >> rabbit with those pings doesn't seem like the best plan to >> me. Especially since Nova (compute) services already check in over RPC >> periodically and the success of that is discoverable en masse through >> the API. > > While initially in favor of this feature Dan's concern has me > reconsidering this. > > Now I believe that if the purpose of this feature is to check the > operational health of a service _using_ oslo.messaging, then I'm against > it.   A naked ping to a generic service point in an application doesn't > prove the operating health of that application beyond its connection to > rabbit. While I understand the need to further avoid loading down Rabbit, I like the universality of this solution, solving a real operational issue. Obviously that creates a trade-off (further loading rabbit to get more operational insights), but nobody forces you to run those ping calls, they would be opt-in. So the proposed code in itself does not weigh down Rabbit, or make anything sit on the bus. > Connectivity monitoring between an application and rabbit is > done using the keepalive connection heartbeat mechanism built into the > rabbit protocol, which O.M. supports today. I'll let Arnaud answer, but I suspect the operational need is code-external checking of the rabbit->agent chain, not code-internal checking of the agent->rabbit chain. The heartbeat mechanism is used by the agent to keep the Rabbit connection alive, ensuring it works in most of the cases. The check described above is to catch the corner cases where it still doesn't. -- Thierry Carrez (ttx) From sshnaidm at redhat.com Mon Aug 3 10:36:12 2020 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Mon, 3 Aug 2020 13:36:12 +0300 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> Message-ID: Hi, Bogdan thanks for raising this up, although I'm not sure I understand what it is the problem with using action plugins. Action plugins are well known official extensions for Ansible, as any other plugins - callback, strategy, inventory etc [1]. It is not any hack or unsupported workaround, it's a known and official feature of Ansible. Why can't we use it? What makes it different from filter, lookup, inventory or any other plugin we already use? Action plugins are also used wide in Ansible itself, for example templates plugin is implemented with action plugin [2]. If Ansible can use it, why can't we? I don't think there is something with "fixing" Ansible, it's not a bug, this is a useful extension. What regards the mentioned action plugin for podman containers, it allows to spawn containers remotely while skipping the connection part for every cycle. I'm not sure you can "fix" Ansible not to do that, it's not a bug. We may not see the difference in a few hosts in CI, but it might be very efficient when we deploy on 100+ hosts oro even 1000+ hosts. In order to evaluate this on bigger setups to understand its value we configured both options - to use action plugin or usual module. If better performance of action plugin will be proven, we can switch to use it, if it doesn't make a difference on bigger setups - then I think we can easily switch back to using an usual module. Thanks [1] https://docs.ansible.com/ansible/latest/plugins/plugins.html [2] https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/template.py On Mon, Aug 3, 2020 at 11:19 AM Bogdan Dobrelya wrote: > There is a trend of writing action plugins, see [0], for simple things, > like just calling a module in a loop. I'm not sure that is the direction > TripleO should go. If ansible is inefficient in this sort of tasks > without custom python code written, we should fix ansible. Otherwise, > what is the ultimate goal of that trend? Is that having only action > plugins in roles and playbooks? > > Please kindly asking the community to stop that, make a step back and > reiterate with the taken approach. Thank you. > > [0] https://review.opendev.org/716108 > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Mon Aug 3 11:12:18 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 03 Aug 2020 13:12:18 +0200 Subject: [nova] If any spec freeze exception now? In-Reply-To: References: Message-ID: On Fri, Jul 31, 2020 at 17:23, Rambo wrote: > Hi,all: > I have a spec which is support volume backed server > rebuild[0].This spec was accepted in Stein, but some of the work did > not finish, so repropose it for Victoria.And this spec is depend on > the cinder reimage api [1], now the reimage api is almost all > completed. So I sincerely wish this spec will approved in Victoria. > If this spec is approved, I will achieve it at once. > I was +2 before on this spec. I see the value of it. I see that Lee, Sylvain and Sean had negative comments after my review. Also I saw that the spec was updated since. It would be good the get a re-review from those folks to see if their issues has been resolved. In general let's try to decide on the feature freeze exception for this on the weekly meeting. I hope folks will re-review the spec until Thursday. I saw you added it as a topic to the meeting agenda, thanks. (I moved that topic to the Open Discussion section) Cheers, gibi > > > Ref: > > [0]:https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild > > [1]:https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api > > Best Regards > Rambo From balazs.gibizer at est.tech Mon Aug 3 11:16:26 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 03 Aug 2020 13:16:26 +0200 Subject: [nova] Victoria Milestone 2 Spec Freeze Message-ID: Hi, Last Thursday we reached Milestone 2 which means Nova is in Spec Freeze now. If you have a spec close to be approved and you wish to request a spec freeze exception then please send a mail to the ML about it. We will make the final decision on the weekly meeting on Thursday. Cheers, gibi From bdobreli at redhat.com Mon Aug 3 12:25:37 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Mon, 3 Aug 2020 14:25:37 +0200 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> Message-ID: On 8/3/20 12:36 PM, Sagi Shnaidman wrote: > Hi, Bogdan > > thanks for raising this up, although I'm not sure I understand what it > is the problem with using action plugins. > Action plugins are well known official extensions for Ansible, as any > other plugins - callback, strategy, inventory etc [1]. It is not any > hack or unsupported workaround, it's a known and official feature of > Ansible. Why can't we use it? What makes it different from filter, I believe the cases that require the use of those should be justified. For the given example, that manages containers in a loop via calling a module, what the written custom callback plugin buys for us? That brings code to maintain, extra complexity, like handling possible corner cases in async mode, dry-run mode etc. But what is justification aside of looks handy? > lookup, inventory or any other plugin we already use? > Action plugins are also used wide in Ansible itself, for example > templates plugin is implemented with action plugin [2]. If Ansible can > use it, why can't we? I don't think there is something with "fixing" > Ansible, it's not a bug, this is a useful extension. > What regards the mentioned action plugin for podman containers, it > allows to spawn containers remotely while skipping the connection part > for every cycle. I'm not sure you can "fix" Ansible not to do that, it's > not a bug. We may not see the difference in a few hosts in CI, but it > might be very efficient when we deploy on 100+ hosts oro even 1000+ > hosts. In order to evaluate this on bigger setups to understand its > value we configured both options - to use action plugin or usual module. > If better performance of action plugin will be proven, we can switch to > use it, if it doesn't make a difference on bigger setups - then I think > we can easily switch back to using an usual module. > > Thanks > > [1] https://docs.ansible.com/ansible/latest/plugins/plugins.html > [2] > https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/template.py > > On Mon, Aug 3, 2020 at 11:19 AM Bogdan Dobrelya > wrote: > > There is a trend of writing action plugins, see [0], for simple things, > like just calling a module in a loop. I'm not sure that is the > direction > TripleO should go. If ansible is inefficient in this sort of tasks > without custom python code written, we should fix ansible. Otherwise, > what is the ultimate goal of that trend? Is that having only action > plugins in roles and playbooks? > > Please kindly asking the community to stop that, make a step back and > reiterate with the taken approach. Thank you. > > [0] https://review.opendev.org/716108 > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > Best regards > Sagi Shnaidman -- Best regards, Bogdan Dobrelya, Irc #bogdando From monika.samal at outlook.com Mon Aug 3 08:38:43 2020 From: monika.samal at outlook.com (Monika Samal) Date: Mon, 3 Aug 2020 08:38:43 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal Cc: Fabian Zimmermann ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Mon Aug 3 08:53:17 2020 From: monika.samal at outlook.com (Monika Samal) Date: Mon, 3 Aug 2020 08:53:17 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , , Message-ID: After Michael suggestion I was able to create load balancer but there is error in status. [cid:de900175-3754-4942-a53d-43c78e425e62] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson Cc: Fabian Zimmermann ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal Cc: Fabian Zimmermann ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26283 bytes Desc: image.png URL: From lyarwood at redhat.com Mon Aug 3 12:55:22 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 3 Aug 2020 13:55:22 +0100 Subject: [nova] openstack-tox-lower-constraints broken Message-ID: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> Hello all, $subject, I've raised the following bug: openstack-tox-lower-constraints failing due to unmet dependency on decorator==4.0.0 https://launchpad.net/bugs/1890123 I'm trying to resolve this below but I honestly feel like I'm going around in circles: https://review.opendev.org/#/q/topic:bug/1890123 If anyone has any tooling and/or recommendations for resolving issues like this I'd appreciate it! Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From dev.faz at gmail.com Mon Aug 3 13:38:21 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 3 Aug 2020 15:38:21 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Did you check the (nova) flavor you use in octavia. Fabian Monika Samal schrieb am Mo., 3. Aug. 2020, 10:53: > After Michael suggestion I was able to create load balancer but there is > error in status. > > > > PFB the error link: > > http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ > ------------------------------ > *From:* Monika Samal > *Sent:* Monday, August 3, 2020 2:08 PM > *To:* Michael Johnson > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Thanks a ton Michael for helping me out > ------------------------------ > *From:* Michael Johnson > *Sent:* Friday, July 31, 2020 3:57 AM > *To:* Monika Samal > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Just to close the loop on this, the octavia.conf file had > "project_name = admin" instead of "project_name = service" in the > [service_auth] section. This was causing the keystone errors when > Octavia was communicating with neutron. > > I don't know if that is a bug in kolla-ansible or was just a local > configuration issue. > > Michael > > On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > > > Hello Fabian,, > > > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > > > Regards, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > Hi, > > > > just to debug, could you replace the auth_type password with v3password? > > > > And do a curl against your :5000 and :35357 urls and paste the output. > > > > Fabian > > > > Monika Samal schrieb am Do., 30. Juli 2020, > 22:15: > > > > Hello Fabian, > > > > http://paste.openstack.org/show/796477/ > > > > Thanks, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > The sections should be > > > > service_auth > > keystone_authtoken > > > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > > > Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26283 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 26283 bytes Desc: not available URL: From sean.mcginnis at gmx.com Mon Aug 3 14:00:52 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 3 Aug 2020 09:00:52 -0500 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> Message-ID: On 8/3/20 7:55 AM, Lee Yarwood wrote: > Hello all, > > $subject, I've raised the following bug: > > openstack-tox-lower-constraints failing due to unmet dependency on decorator==4.0.0 > https://launchpad.net/bugs/1890123 > > I'm trying to resolve this below but I honestly feel like I'm going > around in circles: > > https://review.opendev.org/#/q/topic:bug/1890123 > > If anyone has any tooling and/or recommendations for resolving issues > like this I'd appreciate it! > > Cheers, This appears to be broken for everyone. I initially saw the decorator thing with Cinder, but after looking closer realized it's not that package. The root issue (or at least one level closer to the root issue, that seems to be causing the decorator failure) is that the lower-constraints are not actually being enforced. Even though the logs should it is passing "-c [path to lower-constraints.txt]". So even though things should be constrained to a lower version, presumably a version that works with a different version of decorator, pip is still installing a newer package than what the constraints should allow. There was a pip release on the 28th. Things don't look like they started failing until the 31st for us though, so either that is not it, or there was just a delay before our nodes started picking up the newer version. I tested locally, and at least with version 19.3.1, I am getting the correctly constrained packages installed. Still looking, but thought I would share in case that info triggers any ideas for anyone else. Sean From sean.mcginnis at gmx.com Mon Aug 3 14:12:47 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 3 Aug 2020 09:12:47 -0500 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> Message-ID: <6a92c9c8-9cc5-4b8e-4204-13545b40e5a2@gmx.com> > The root issue (or at least one level closer to the root issue, that > seems to be causing the decorator failure) is that the lower-constraints > are not actually being enforced. Even though the logs should it is > passing "-c [path to lower-constraints.txt]". So even though things > should be constrained to a lower version, presumably a version that > works with a different version of decorator, pip is still installing a > newer package than what the constraints should allow. > > There was a pip release on the 28th. Things don't look like they started > failing until the 31st for us though, so either that is not it, or there > was just a delay before our nodes started picking up the newer version. > > I tested locally, and at least with version 19.3.1, I am getting the > correctly constrained packages installed. > > Still looking, but thought I would share in case that info triggers any > ideas for anyone else. > I upgraded my pip and rebuilt the venv. The new pip has some good warnings emitted about some incompatible conflicts, so that part is good, but it did not change the package installation behavior. I am still able to get the correctly constrained packages installed locally on Fedora 32. So at least so far, it doesn't appear to be a pip issue. From mnaser at vexxhost.com Mon Aug 3 14:21:33 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Aug 2020 10:21:33 -0400 Subject: [largescale-sig] RPC ping In-Reply-To: <20200727095744.GK31915@sync> References: <20200727095744.GK31915@sync> Message-ID: I have a few operational suggestions on how I think we could do this best: 1. I think exposing a healthcheck endpoint that _actually_ runs the ping and responds with a 200 OK makes a lot more sense in terms of being able to run it inside something like Kubernetes, you end up with a "who makes the ping and who responds to it" type of scenario which can be tricky though I'm sure we can figure that out 2. I've found that newer releases of RabbitMQ really help with those un-usable queues after a split, I haven't had any issues at all with newer releases, so that could be something to help your life be a lot easier. 3. You mentioned you're moving towards Kubernetes, we're doing the same and building an operator: https://opendev.org/vexxhost/openstack-operator -- Because the operator manages the whole thing and Kubernetes does it's thing too, we started moving towards 1 (single) rabbitmq per service, which reaaaaaaally helped a lot in stabilizing things. Oslo messaging is a lot better at recovering when a single service IP is pointing towards it because it doesn't do weird things like have threads trying to connect to other Rabbit ports. Just a thought. 4. In terms of telemetry and making sure you avoid that issue, we track the consumption rates of queues inside OpenStack. OpenStack consumption rate should be constant and never growing, anytime it grows, we instantly detect that something is fishy. However, the other issue comes in that when you restart any openstack service, it 'forgets' all it's existing queues and then you have a set of building up queues until they automatically expire which happens around 30 minutes-ish, so it makes that alarm of "things are not being consumed" a little noisy if you're restarting services Sorry for the wall of super unorganized text, all over the place here but thought I'd chime in with my 2 cents :) On Mon, Jul 27, 2020 at 6:04 AM Arnaud Morin wrote: > > Hey all, > > TLDR: I propose a change to oslo_messaging to allow doing a ping over RPC, > this is useful to monitor liveness of agents. > > > Few weeks ago, I proposed a patch to oslo_messaging [1], which is adding a > ping endpoint to RPC dispatcher. > It means that every openstack service which is using oslo_messaging RPC > endpoints (almosts all OpenStack services and agents - e.g. neutron > server + agents, nova + computes, etc.) will then be able to answer to a > specific "ping" call over RPC. > > I decided to propose this patch in my company mainly for 2 reasons: > 1 - we are struggling monitoring our nova compute and neutron agents in a > correct way: > > 1.1 - sometimes our agents are disconnected from RPC, but the python process > is still running. > 1.2 - sometimes the agent is still connected, but the queue / binding on > rabbit cluster is not working anymore (after a rabbit split for > example). This one is very hard to debug, because the agent is still > reporting health correctly on neutron server, but it's not able to > receive messages anymore. > > > 2 - we are trying to monitor agents running in k8s pods: > when running a python agent (neutron l3-agent for example) in a k8s pod, we > wanted to find a way to monitor if it is still live of not. > > > Adding a RPC ping endpoint could help us solve both these issues. > Note that we still need an external mechanism (out of OpenStack) to do this > ping. > We also think it could be nice for other OpenStackers, and especially > large scale ops. > > Feel free to comment. > > > [1] https://review.opendev.org/#/c/735385/ > > > -- > Arnaud Morin > > -- Mohammed Naser VEXXHOST, Inc. From iurygregory at gmail.com Mon Aug 3 14:42:38 2020 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 3 Aug 2020 16:42:38 +0200 Subject: [ironic] let's talk about grenade In-Reply-To: References: Message-ID: Hello Everyone, We will meet this Thursday (August 6th) at 2pm - 3pm UTC Time on bluejeans [1]. Thank you! [1] https://bluejeans.com/imelofer Em qua., 29 de jul. de 2020 às 20:37, Iury Gregory escreveu: > Hello everyone, > > Since we didn't get many responses I will keep the doodle open till Friday > =) > > Em seg., 27 de jul. de 2020 às 17:55, Iury Gregory > escreveu: > >> Hello everyone, >> >> I'm still on the fight to move our ironic-grenade-dsvm-multinode-multitenant >> to zuulv3 [1], you can find some of my findings on the etherpad [2] under `Move >> to Zuul v3 Jobs (Iurygregory)`. >> >> If you are interested in helping out we are going to schedule a meeting >> to discuss about this, please use the doodle in [3]. I will close the >> doodle on Wed July 29. >> >> Thanks! >> >> [1] https://review.opendev.org/705030 >> [2] https://etherpad.openstack.org/p/IronicWhiteBoard >> [3] https://doodle.com/poll/m69b5zwnsbgcysct >> >> -- >> >> >> *Att[]'sIury Gregory Melo Ferreira * >> *MSc in Computer Science at UFCG* >> *Part of the puppet-manager-core team in OpenStack* >> *Software Engineer at Red Hat Czech* >> *Social*: https://www.linkedin.com/in/iurygregory >> *E-mail: iurygregory at gmail.com * >> > > > -- > > > *Att[]'sIury Gregory Melo Ferreira * > *MSc in Computer Science at UFCG* > *Part of the puppet-manager-core team in OpenStack* > *Software Engineer at Red Hat Czech* > *Social*: https://www.linkedin.com/in/iurygregory > *E-mail: iurygregory at gmail.com * > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Mon Aug 3 14:46:57 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 3 Aug 2020 16:46:57 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal schrieb am Mo., 3. Aug. 2020, 15:46: > It's registered > > Get Outlook for Android > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Monday, August 3, 2020 7:08:21 PM > *To:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Did you check the (nova) flavor you use in octavia. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 10:53: > > After Michael suggestion I was able to create load balancer but there is > error in status. > > > > PFB the error link: > > http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ > ------------------------------ > *From:* Monika Samal > *Sent:* Monday, August 3, 2020 2:08 PM > *To:* Michael Johnson > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Thanks a ton Michael for helping me out > ------------------------------ > *From:* Michael Johnson > *Sent:* Friday, July 31, 2020 3:57 AM > *To:* Monika Samal > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Just to close the loop on this, the octavia.conf file had > "project_name = admin" instead of "project_name = service" in the > [service_auth] section. This was causing the keystone errors when > Octavia was communicating with neutron. > > I don't know if that is a bug in kolla-ansible or was just a local > configuration issue. > > Michael > > On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > > > Hello Fabian,, > > > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > > > Regards, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > Hi, > > > > just to debug, could you replace the auth_type password with v3password? > > > > And do a curl against your :5000 and :35357 urls and paste the output. > > > > Fabian > > > > Monika Samal schrieb am Do., 30. Juli 2020, > 22:15: > > > > Hello Fabian, > > > > http://paste.openstack.org/show/796477/ > > > > Thanks, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > The sections should be > > > > service_auth > > keystone_authtoken > > > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > > > Fabian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Mon Aug 3 15:31:52 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 3 Aug 2020 10:31:52 -0500 Subject: [release] Release countdown for week R-10 August 3 - 7 Message-ID: <20200803153152.GA3471444@sm-workstation> Development Focus ----------------- We are now past the Victoria-2 milestone, and entering the last development phase of the cycle. Teams should be focused on implementing planned work for the cycle. Now is a good time to review those plans and reprioritize anything if needed based on the what progress has been made and what looks realistic to complete in the next few weeks. General Information ------------------- Looking ahead to the end of the release cycle, please be aware of the feature freeze dates. Those vary depending on deliverable type: * General libraries (except client libraries) need to have their last feature release before Non-client library freeze (September 3). Their stable branches are cut early. * Client libraries (think python-*client libraries) need to have their last feature release before Client library freeze (September 10) * Deliverables following a cycle-with-rc model (that would be most services) observe a Feature freeze on that same date, September 10. Any feature addition beyond that date should be discussed on the mailing-list and get PTL approval. After feature freeze, cycle-with-rc deliverables need to produce a first release candidate (and a stable branch) before RC1 deadline (September 24) * Deliverables following cycle-with-intermediary model can release as necessary, but in all cases before Final RC deadline (October 8) Upcoming Deadlines & Dates -------------------------- Ussuri Cycle-trailing deadline: August 13 (R-9 week) Non-client library freeze: September 3 (R-6 week) Client library freeze: September 10 (R-5 week) Ussuri-3 milestone: September 10 (R-5 week) Victoria release: October 14 From johnsomor at gmail.com Mon Aug 3 15:40:40 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 3 Aug 2020 08:40:40 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann wrote: > Seems like the flavor is missing or empty '' - check for typos and enable > debug. > > Check if the nova req contains valid information/flavor. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 15:46: > >> It's registered >> >> Get Outlook for Android >> ------------------------------ >> *From:* Fabian Zimmermann >> *Sent:* Monday, August 3, 2020 7:08:21 PM >> *To:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Did you check the (nova) flavor you use in octavia. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 10:53: >> >> After Michael suggestion I was able to create load balancer but there is >> error in status. >> >> >> >> PFB the error link: >> >> http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Monday, August 3, 2020 2:08 PM >> *To:* Michael Johnson >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Thanks a ton Michael for helping me out >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Friday, July 31, 2020 3:57 AM >> *To:* Monika Samal >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Just to close the loop on this, the octavia.conf file had >> "project_name = admin" instead of "project_name = service" in the >> [service_auth] section. This was causing the keystone errors when >> Octavia was communicating with neutron. >> >> I don't know if that is a bug in kolla-ansible or was just a local >> configuration issue. >> >> Michael >> >> On Thu, Jul 30, 2020 at 1:39 PM Monika Samal >> wrote: >> > >> > Hello Fabian,, >> > >> > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ >> > >> > Regards, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:57 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > Hi, >> > >> > just to debug, could you replace the auth_type password with v3password? >> > >> > And do a curl against your :5000 and :35357 urls and paste the output. >> > >> > Fabian >> > >> > Monika Samal schrieb am Do., 30. Juli 2020, >> 22:15: >> > >> > Hello Fabian, >> > >> > http://paste.openstack.org/show/796477/ >> > >> > Thanks, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:38 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > The sections should be >> > >> > service_auth >> > keystone_authtoken >> > >> > if i read the docs correctly. Maybe you can just paste your config >> (remove/change passwords) to paste.openstack.org and post the link? >> > >> > Fabian >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Mon Aug 3 18:12:07 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Mon, 3 Aug 2020 13:12:07 -0500 Subject: CFP Deadline Tomorrow - Virtual Open Infrastructure Summit Message-ID: <3353421C-E038-47F5-B68F-3828232639EB@openstack.org> Hi everyone, It’s time to submit your Open Infrastructure virtual Summit presentations[1]! The CFP deadline is tomorrow. Submit sessions featuring open source projects including Airship, Ansible, Ceph, Kata Containers, Kubernetes, ONAP, OpenStack, OPNFV, StarlingX and Zuul. As a reminder, these are the 2020 Tracks: 5G, NFV & Edge AI, Machine Learning & HPC CI/CD Container Infrastructure Getting Started Hands-on Workshops Open Development Private & Hybrid Cloud Public Cloud Security Get your presentations, panels, and workshops in before August 4 at 11:59 pm PT (August 5 at 6:59 am UTC). The content submission process for the Forum and Project Teams Gathering (PTG) will be managed separately in the upcoming months. The Summit Programming Committee has shared topics by Track for community members interested in speaking at the upcoming Summit. Check out the submission tips[2]! Then don’t forget to register[3] for the virtual Open Infrastructure Summit taking place October 19-23, 2020 at no cost to you. Need more time? Reach out to speakersupport at openstack.org with any questions or concerns. Cheers, Ashlee [1] https://cfp.openstack.org/ [2] https://superuser.openstack.org/articles/virtual-open-infrastructure-summit-cfp/ [3] https://openinfrasummit2020.eventbrite.com Ashlee Ferguson Community & Events Coordinator OpenStack Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Mon Aug 3 18:14:21 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Mon, 3 Aug 2020 20:14:21 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." Message-ID: We have just updated a small OpenStack cluster to Train. Everything seems working, but "cinder-status upgrade check" complains that services and volumes must have a service UUID [*]. What does this exactly mean? Thanks, Massimo [*] +--------------------------------------------------------------------+ | Check: Service UUIDs | | Result: Failure | | Details: Services and volumes must have a service UUID. Please fix | | this issue by running Queens online data migrations. | -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria at vmartinezdelacruz.com Mon Aug 3 19:12:06 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Mon, 3 Aug 2020 16:12:06 -0300 Subject: [manila] Doc-a-thon event coming up next Thursday (Aug 6th) In-Reply-To: References: Message-ID: Hi everybody, An update on this. We decided to take over the upstream meeting directly and start *at* the slot of the Manila weekly meeting. We will join the Jitsi bridge [0] at 3pm UTC time and start going through the list of bugs we have in [1]. There is no finish time, you can join and leave the bridge freely. We will also use IRC Freenode channel #openstack-manila if needed. If the time slot doesn't work for you (we are aware this is not a friendly slot for EMEA/APAC), you can still go through the bug list in [1], claim a bug and work on it. If things go well, we plan to do this again in a different slot so everybody that wants to collaborate can do it. Looking forward to see you there, Cheers, V [0] https://meetpad.opendev.org/ManilaV-ReleaseDocAThon [1] https://ethercalc.openstack.org/ur17jprbprxx On Fri, Jul 31, 2020 at 2:05 PM Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> wrote: > Hi folks, > > We will be organizing a doc-a-thon next Thursday, August 6th, with the > main goal of improving our docs for the next release. We will be gathering > on our Freenode channel #openstack-manila after our weekly meeting (3pm > UTC) and also using a videoconference tool (exact details TBC) to go over a > curated list of opened doc bugs we have here [0]. > > *Your* participation is truly valued, being you an already Manila > contributor or if you are interested in contributing and you didn't know > how, so looking forward to seeing you there :) > > Cheers, > > Victoria > > [0] https://ethercalc.openstack.org/ur17jprbprxx > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gagehugo at gmail.com Mon Aug 3 19:19:07 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 3 Aug 2020 14:19:07 -0500 Subject: [openstack-helm] IRC Meeting Canceled 08/04 Message-ID: Hello everyone, Since I will be unavailable tomorrow and there's currently no agenda, I am going to cancel the meeting for tomorrow. We will meet again next week at the regular time. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria at vmartinezdelacruz.com Mon Aug 3 19:21:16 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Mon, 3 Aug 2020 16:21:16 -0300 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: Hi Ignazio, How did you deploy Manila and Manila UI? Can you point me toward the docs you used? Also, which is the specific workflow you are following to reach that trace? Just opening the dashboard and clicking on the Shares tab? Cheers, V On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano wrote: > Hello, I installed manila on openstack stein and it works by command line > mat the manila ui does not work and in httpd error log I read: > > [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR > django.request Internal Server Error: /dashboard/project/shares/ > [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback (most > recent call last): > [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line > 41, in inner > [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = > get_response(request) > [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, > in _get_response > [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = > self.process_exception_by_middleware(e, request) > [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, > in _get_response > [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = > wrapped_callback(request, *callback_args, **callback_kwargs) > [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec > [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return > view_func(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return > view_func(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec > [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return > view_func(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec > [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return > view_func(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec > [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return > view_func(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, > in view > [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return > self.dispatch(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, > in dispatch > [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return > handler(request, *args, **kwargs) > [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get > [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = > self.construct_tables() > [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in > construct_tables > [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = > self.handle_table(table) > [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in > handle_table > [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = > self._get_data_dict() > [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in > _get_data_dict > [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] > data.extend(func()) > [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in > wrapped > [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = > cache[key] = func(*args, **kwargs) > [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", > line 57, in get_shares_data > [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets = > manila.share_network_list(self.request) > [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File > "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in > share_network_list > [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return > manilaclient(request).share_networks.list(detailed=detailed, > [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] AttributeError: > 'NoneType' object has no attribute 'share_networks' > > Please, anyone could help ? > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Mon Aug 3 19:28:51 2020 From: aschultz at redhat.com (Alex Schultz) Date: Mon, 3 Aug 2020 13:28:51 -0600 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> Message-ID: On Mon, Aug 3, 2020 at 6:34 AM Bogdan Dobrelya wrote: > > On 8/3/20 12:36 PM, Sagi Shnaidman wrote: > > Hi, Bogdan > > > > thanks for raising this up, although I'm not sure I understand what it > > is the problem with using action plugins. > > Action plugins are well known official extensions for Ansible, as any > > other plugins - callback, strategy, inventory etc [1]. It is not any > > hack or unsupported workaround, it's a known and official feature of > > Ansible. Why can't we use it? What makes it different from filter, > > I believe the cases that require the use of those should be justified. > For the given example, that manages containers in a loop via calling a > module, what the written custom callback plugin buys for us? That brings > code to maintain, extra complexity, like handling possible corner cases > in async mode, dry-run mode etc. But what is justification aside of > looks handy? I disagree that we shouldn't use action plugins or modules. Tasks themselves are expensive at scale. We saw that when we switched away from paunch to container management in pure ansible tasks. This exposed that looping tasks are even more expensive and complex error handling and workflows are better suited for modules or action plugins than a series of tasks. This is not something to be "fixed in ansible". This is the nature of the executor and strategy related interactions. Should everything be converted to modules and plugins? no. Should everything be tasks only? no. It's a balance that must be struck between when a specific set of complex tasks need extra data processing or error handling. Switching to modules or action plugins allows us to unit test our logic. Using tasks do not have such a concept outside of writing complex molecule testing. IMHO it's safer to switch to modules/action plugins than writing task logic. IMHO the issue that I see with the switch to Action plugins is the increased load on the ansible "controller" node during execution. Modules may be better depending on the task being managed. But I believe with unit testing, action plugins or modules provide a cleaner and more testable solution than writing roles consisting only of tasks. > > > lookup, inventory or any other plugin we already use? > > Action plugins are also used wide in Ansible itself, for example > > templates plugin is implemented with action plugin [2]. If Ansible can > > use it, why can't we? I don't think there is something with "fixing" > > Ansible, it's not a bug, this is a useful extension. > > What regards the mentioned action plugin for podman containers, it > > allows to spawn containers remotely while skipping the connection part > > for every cycle. I'm not sure you can "fix" Ansible not to do that, it's > > not a bug. We may not see the difference in a few hosts in CI, but it > > might be very efficient when we deploy on 100+ hosts oro even 1000+ > > hosts. In order to evaluate this on bigger setups to understand its > > value we configured both options - to use action plugin or usual module. > > If better performance of action plugin will be proven, we can switch to > > use it, if it doesn't make a difference on bigger setups - then I think > > we can easily switch back to using an usual module. > > > > Thanks > > > > [1] https://docs.ansible.com/ansible/latest/plugins/plugins.html > > [2] > > https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/template.py > > > > On Mon, Aug 3, 2020 at 11:19 AM Bogdan Dobrelya > > wrote: > > > > There is a trend of writing action plugins, see [0], for simple things, > > like just calling a module in a loop. I'm not sure that is the > > direction > > TripleO should go. If ansible is inefficient in this sort of tasks > > without custom python code written, we should fix ansible. Otherwise, > > what is the ultimate goal of that trend? Is that having only action > > plugins in roles and playbooks? > > > > Please kindly asking the community to stop that, make a step back and > > reiterate with the taken approach. Thank you. > > > > [0] https://review.opendev.org/716108 > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > > > > > > > > -- > > Best regards > > Sagi Shnaidman > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > From ignaziocassano at gmail.com Mon Aug 3 19:32:25 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 21:32:25 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: Hello Victoria, I installed manila with yum on centos 7. Yes, I open the dashboard and I click on shares tab. I think the problem is I not using share networks because I am using netapp drivers without share management option. Looking at the code the dashboard check if there are shares under shared networks. My understading is that shared networks should be created only when shared management option is true. Ignazio Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> ha scritto: > Hi Ignazio, > > How did you deploy Manila and Manila UI? Can you point me toward the docs > you used? > > Also, which is the specific workflow you are following to reach that > trace? Just opening the dashboard and clicking on the Shares tab? > > Cheers, > > V > > On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano > wrote: > >> Hello, I installed manila on openstack stein and it works by command line >> mat the manila ui does not work and in httpd error log I read: >> >> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >> django.request Internal Server Error: /dashboard/project/shares/ >> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback (most >> recent call last): >> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >> 41, in inner >> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = >> get_response(request) >> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >> in _get_response >> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = >> self.process_exception_by_middleware(e, request) >> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >> in _get_response >> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = >> wrapped_callback(request, *callback_args, **callback_kwargs) >> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >> in view >> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >> self.dispatch(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >> in dispatch >> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >> handler(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = >> self.construct_tables() >> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >> construct_tables >> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = >> self.handle_table(table) >> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >> handle_table >> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >> self._get_data_dict() >> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >> _get_data_dict >> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >> data.extend(func()) >> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >> wrapped >> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >> cache[key] = func(*args, **kwargs) >> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >> line 57, in get_shares_data >> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets = >> manila.share_network_list(self.request) >> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >> share_network_list >> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >> manilaclient(request).share_networks.list(detailed=detailed, >> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] AttributeError: >> 'NoneType' object has no attribute 'share_networks' >> >> Please, anyone could help ? >> Ignazio >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon Aug 3 19:34:34 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 21:34:34 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: PS I followed installation guide under docs.openstack.org. Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> ha scritto: > Hi Ignazio, > > How did you deploy Manila and Manila UI? Can you point me toward the docs > you used? > > Also, which is the specific workflow you are following to reach that > trace? Just opening the dashboard and clicking on the Shares tab? > > Cheers, > > V > > On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano > wrote: > >> Hello, I installed manila on openstack stein and it works by command line >> mat the manila ui does not work and in httpd error log I read: >> >> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >> django.request Internal Server Error: /dashboard/project/shares/ >> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback (most >> recent call last): >> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >> 41, in inner >> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = >> get_response(request) >> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >> in _get_response >> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = >> self.process_exception_by_middleware(e, request) >> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >> in _get_response >> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = >> wrapped_callback(request, *callback_args, **callback_kwargs) >> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >> view_func(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >> in view >> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >> self.dispatch(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >> in dispatch >> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >> handler(request, *args, **kwargs) >> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = >> self.construct_tables() >> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >> construct_tables >> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = >> self.handle_table(table) >> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >> handle_table >> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >> self._get_data_dict() >> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >> _get_data_dict >> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >> data.extend(func()) >> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >> wrapped >> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >> cache[key] = func(*args, **kwargs) >> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >> line 57, in get_shares_data >> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets = >> manila.share_network_list(self.request) >> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >> share_network_list >> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >> manilaclient(request).share_networks.list(detailed=detailed, >> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] AttributeError: >> 'NoneType' object has no attribute 'share_networks' >> >> Please, anyone could help ? >> Ignazio >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon Aug 3 19:41:34 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 21:41:34 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: PS ps Sorry If aI am writing again. The command: manila list let me to show shares I created with command line. The dashboard gives errors I reported in my first email. Looking at manila.py line 280 it checks shares under share networks. Ignazio Il Lun 3 Ago 2020, 21:34 Ignazio Cassano ha scritto: > PS > I followed installation guide under docs.openstack.org. > > > Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < > victoria at vmartinezdelacruz.com> ha scritto: > >> Hi Ignazio, >> >> How did you deploy Manila and Manila UI? Can you point me toward the docs >> you used? >> >> Also, which is the specific workflow you are following to reach that >> trace? Just opening the dashboard and clicking on the Shares tab? >> >> Cheers, >> >> V >> >> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano >> wrote: >> >>> Hello, I installed manila on openstack stein and it works by command >>> line mat the manila ui does not work and in httpd error log I read: >>> >>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>> django.request Internal Server Error: /dashboard/project/shares/ >>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback (most >>> recent call last): >>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>> 41, in inner >>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = >>> get_response(request) >>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>> in _get_response >>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = >>> self.process_exception_by_middleware(e, request) >>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>> in _get_response >>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = >>> wrapped_callback(request, *callback_args, **callback_kwargs) >>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>> view_func(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>> view_func(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>> view_func(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>> view_func(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>> view_func(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>> in view >>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>> self.dispatch(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>> in dispatch >>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>> handler(request, *args, **kwargs) >>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = >>> self.construct_tables() >>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>> construct_tables >>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = >>> self.handle_table(table) >>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>> handle_table >>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>> self._get_data_dict() >>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>> _get_data_dict >>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>> data.extend(func()) >>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>> wrapped >>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >>> cache[key] = func(*args, **kwargs) >>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>> line 57, in get_shares_data >>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets >>> = manila.share_network_list(self.request) >>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>> share_network_list >>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>> manilaclient(request).share_networks.list(detailed=detailed, >>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] AttributeError: >>> 'NoneType' object has no attribute 'share_networks' >>> >>> Please, anyone could help ? >>> Ignazio >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Mon Aug 3 20:05:25 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 3 Aug 2020 13:05:25 -0700 Subject: [ironic][stable] Include ironic-core in ironic-stable-maint ? Message-ID: Greetings awesome humans, I have a conundrum, and largely it is over stable branch maintenance. In essence, our stable branch approvers are largely down to Dmitry, Riccardo, and Myself. I think this needs to change and I'd like to propose that we go ahead and change ironic-stable-maint to just include ironic-core in order to prevent the bottleneck and conflict and risk which this presents. I strongly believe that our existing cores would all do the right thing if presented with the question of if a change needed to be merged. So honestly I'm not concerned by this proposal. Plus, some of our sub-projects have operated this way for quite some time. Thoughts, concerns, worries? -Julia From ignaziocassano at gmail.com Mon Aug 3 20:25:55 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 22:25:55 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: I mean I am using dhss false Il Lun 3 Ago 2020, 21:41 Ignazio Cassano ha scritto: > PS ps > Sorry If aI am writing again. > The command: > manila list let me to show shares I created with command line. > The dashboard gives errors I reported in my first email. > Looking at manila.py line 280 it checks shares under share networks. > Ignazio > > > Il Lun 3 Ago 2020, 21:34 Ignazio Cassano ha > scritto: > >> PS >> I followed installation guide under docs.openstack.org. >> >> >> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >> victoria at vmartinezdelacruz.com> ha scritto: >> >>> Hi Ignazio, >>> >>> How did you deploy Manila and Manila UI? Can you point me toward the >>> docs you used? >>> >>> Also, which is the specific workflow you are following to reach that >>> trace? Just opening the dashboard and clicking on the Shares tab? >>> >>> Cheers, >>> >>> V >>> >>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano >>> wrote: >>> >>>> Hello, I installed manila on openstack stein and it works by command >>>> line mat the manila ui does not work and in httpd error log I read: >>>> >>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>> django.request Internal Server Error: /dashboard/project/shares/ >>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>> (most recent call last): >>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>> 41, in inner >>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response = >>>> get_response(request) >>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>> in _get_response >>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response = >>>> self.process_exception_by_middleware(e, request) >>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>> in _get_response >>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response = >>>> wrapped_callback(request, *callback_args, **callback_kwargs) >>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>>> view_func(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>>> view_func(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>>> view_func(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>>> view_func(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>>> view_func(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>> in view >>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>>> self.dispatch(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>> in dispatch >>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>>> handler(request, *args, **kwargs) >>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = >>>> self.construct_tables() >>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>> construct_tables >>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = >>>> self.handle_table(table) >>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>> handle_table >>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>>> self._get_data_dict() >>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>> _get_data_dict >>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>> data.extend(func()) >>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>> wrapped >>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >>>> cache[key] = func(*args, **kwargs) >>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>> line 57, in get_shares_data >>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] share_nets >>>> = manila.share_network_list(self.request) >>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>> share_network_list >>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>>> manilaclient(request).share_networks.list(detailed=detailed, >>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>> >>>> Please, anyone could help ? >>>> Ignazio >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Mon Aug 3 21:00:53 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 3 Aug 2020 14:00:53 -0700 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: On Mon, Aug 3, 2020 at 1:31 PM Ignazio Cassano wrote: > I mean I am using dhss false > > Il Lun 3 Ago 2020, 21:41 Ignazio Cassano ha > scritto: > >> PS ps >> Sorry If aI am writing again. >> The command: >> manila list let me to show shares I created with command line. >> The dashboard gives errors I reported in my first email. >> Looking at manila.py line 280 it checks shares under share networks. >> Ignazio >> >> >> Il Lun 3 Ago 2020, 21:34 Ignazio Cassano ha >> scritto: >> >>> PS >>> I followed installation guide under docs.openstack.org. >>> >>> >>> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >>> victoria at vmartinezdelacruz.com> ha scritto: >>> >>>> Hi Ignazio, >>>> >>>> How did you deploy Manila and Manila UI? Can you point me toward the >>>> docs you used? >>>> >>>> Also, which is the specific workflow you are following to reach that >>>> trace? Just opening the dashboard and clicking on the Shares tab? >>>> >>>> Cheers, >>>> >>>> V >>>> >>>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano < >>>> ignaziocassano at gmail.com> wrote: >>>> >>>>> Hello, I installed manila on openstack stein and it works by command >>>>> line mat the manila ui does not work and in httpd error log I read: >>>>> >>>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>>> django.request Internal Server Error: /dashboard/project/shares/ >>>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>>> (most recent call last): >>>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>>> 41, in inner >>>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response >>>>> = get_response(request) >>>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>>> in _get_response >>>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response >>>>> = self.process_exception_by_middleware(e, request) >>>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>>> in _get_response >>>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response >>>>> = wrapped_callback(request, *callback_args, **callback_kwargs) >>>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>>>> view_func(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>>>> view_func(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>>>> view_func(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>>>> view_func(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>>>> view_func(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>>> in view >>>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>>>> self.dispatch(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>>> in dispatch >>>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>>>> handler(request, *args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled = >>>>> self.construct_tables() >>>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>>> construct_tables >>>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled = >>>>> self.handle_table(table) >>>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>>> handle_table >>>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>>>> self._get_data_dict() >>>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>>> _get_data_dict >>>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>>> data.extend(func()) >>>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>>> wrapped >>>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >>>>> cache[key] = func(*args, **kwargs) >>>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>>> line 57, in get_shares_data >>>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] >>>>> share_nets = manila.share_network_list(self.request) >>>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>>> share_network_list >>>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>>>> manilaclient(request).share_networks.list(detailed=detailed, >>>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>>> >>>> Looking at the error here, and the code - it could be that the UI isn't able to retrieve the manila service endpoint from the service catalog. If this is the case, you must be able to see a "DEBUG" level log in your httpd error log with "no share service configured". Do you see it? As the user you're using on horizon, can you perform "openstack catalog list" and check whether the "sharev2" service type exists in that list? > >>>>> Please, anyone could help ? >>>>> Ignazio >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon Aug 3 21:45:55 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 3 Aug 2020 23:45:55 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: Hello Goutham,tomorrow I will check the catalog. Must I enable the debug option in dashboard local_setting or in manila.conf? Thanks Ignazio Il Lun 3 Ago 2020, 23:01 Goutham Pacha Ravi ha scritto: > > > > On Mon, Aug 3, 2020 at 1:31 PM Ignazio Cassano > wrote: > >> I mean I am using dhss false >> >> Il Lun 3 Ago 2020, 21:41 Ignazio Cassano ha >> scritto: >> >>> PS ps >>> Sorry If aI am writing again. >>> The command: >>> manila list let me to show shares I created with command line. >>> The dashboard gives errors I reported in my first email. >>> Looking at manila.py line 280 it checks shares under share networks. >>> Ignazio >>> >>> >>> Il Lun 3 Ago 2020, 21:34 Ignazio Cassano ha >>> scritto: >>> >>>> PS >>>> I followed installation guide under docs.openstack.org. >>>> >>>> >>>> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >>>> victoria at vmartinezdelacruz.com> ha scritto: >>>> >>>>> Hi Ignazio, >>>>> >>>>> How did you deploy Manila and Manila UI? Can you point me toward the >>>>> docs you used? >>>>> >>>>> Also, which is the specific workflow you are following to reach that >>>>> trace? Just opening the dashboard and clicking on the Shares tab? >>>>> >>>>> Cheers, >>>>> >>>>> V >>>>> >>>>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano < >>>>> ignaziocassano at gmail.com> wrote: >>>>> >>>>>> Hello, I installed manila on openstack stein and it works by command >>>>>> line mat the manila ui does not work and in httpd error log I read: >>>>>> >>>>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>>>> django.request Internal Server Error: /dashboard/project/shares/ >>>>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>>>> (most recent call last): >>>>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>>>> 41, in inner >>>>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] response >>>>>> = get_response(request) >>>>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>>>> in _get_response >>>>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] response >>>>>> = self.process_exception_by_middleware(e, request) >>>>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>>>> in _get_response >>>>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] response >>>>>> = wrapped_callback(request, *callback_args, **callback_kwargs) >>>>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>>>>> view_func(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>>>>> view_func(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>>>>> view_func(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>>>>> view_func(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>>>>> view_func(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>>>> in view >>>>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>>>>> self.dispatch(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>>>> in dispatch >>>>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>>>>> handler(request, *args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled >>>>>> = self.construct_tables() >>>>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>>>> construct_tables >>>>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled >>>>>> = self.handle_table(table) >>>>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>>>> handle_table >>>>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>>>>> self._get_data_dict() >>>>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>>>> _get_data_dict >>>>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>>>> data.extend(func()) >>>>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>>>> wrapped >>>>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >>>>>> cache[key] = func(*args, **kwargs) >>>>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>>>> line 57, in get_shares_data >>>>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] >>>>>> share_nets = manila.share_network_list(self.request) >>>>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>>>> share_network_list >>>>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>>>>> manilaclient(request).share_networks.list(detailed=detailed, >>>>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>>>> >>>>> > Looking at the error here, and the code - it could be that the UI isn't > able to retrieve the manila service endpoint from the service catalog. If > this is the case, you must be able to see a "DEBUG" level log in your httpd > error log with "no share service configured". Do you see it? > > As the user you're using on horizon, can you perform "openstack catalog > list" and check whether the "sharev2" service type exists in that list? > > >> >>>>>> Please, anyone could help ? >>>>>> Ignazio >>>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria at vmartinezdelacruz.com Tue Aug 4 00:53:09 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Mon, 3 Aug 2020 21:53:09 -0300 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: In local_settings.py under openstack-dashboard. And then restart the webserver. Did you copy the enable and local files from manila-ui under Horizon's namespace? Check out https://docs.openstack.org/manila-ui/latest/install/installation.html We can continue debugging tomorrow, we will find out what is going on. Cheers, V On Mon, Aug 3, 2020, 6:46 PM Ignazio Cassano wrote: > Hello Goutham,tomorrow I will check the catalog. > Must I enable the debug option in dashboard local_setting or in > manila.conf? > Thanks > Ignazio > > > Il Lun 3 Ago 2020, 23:01 Goutham Pacha Ravi ha > scritto: > >> >> >> >> On Mon, Aug 3, 2020 at 1:31 PM Ignazio Cassano >> wrote: >> >>> I mean I am using dhss false >>> >>> Il Lun 3 Ago 2020, 21:41 Ignazio Cassano ha >>> scritto: >>> >>>> PS ps >>>> Sorry If aI am writing again. >>>> The command: >>>> manila list let me to show shares I created with command line. >>>> The dashboard gives errors I reported in my first email. >>>> Looking at manila.py line 280 it checks shares under share networks. >>>> Ignazio >>>> >>>> >>>> Il Lun 3 Ago 2020, 21:34 Ignazio Cassano ha >>>> scritto: >>>> >>>>> PS >>>>> I followed installation guide under docs.openstack.org. >>>>> >>>>> >>>>> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >>>>> victoria at vmartinezdelacruz.com> ha scritto: >>>>> >>>>>> Hi Ignazio, >>>>>> >>>>>> How did you deploy Manila and Manila UI? Can you point me toward the >>>>>> docs you used? >>>>>> >>>>>> Also, which is the specific workflow you are following to reach that >>>>>> trace? Just opening the dashboard and clicking on the Shares tab? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> V >>>>>> >>>>>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano < >>>>>> ignaziocassano at gmail.com> wrote: >>>>>> >>>>>>> Hello, I installed manila on openstack stein and it works by command >>>>>>> line mat the manila ui does not work and in httpd error log I read: >>>>>>> >>>>>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>>>>> django.request Internal Server Error: /dashboard/project/shares/ >>>>>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>>>>> (most recent call last): >>>>>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>>>>> 41, in inner >>>>>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] >>>>>>> response = get_response(request) >>>>>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>>>>> in _get_response >>>>>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] >>>>>>> response = self.process_exception_by_middleware(e, request) >>>>>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>>>>> in _get_response >>>>>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] >>>>>>> response = wrapped_callback(request, *callback_args, **callback_kwargs) >>>>>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>>>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>>>>>> view_func(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>>>>>> view_func(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>>>>>> view_func(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>>>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>>>>>> view_func(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>>>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>>>>>> view_func(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>>>>> in view >>>>>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>>>>>> self.dispatch(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>>>>> in dispatch >>>>>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>>>>>> handler(request, *args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>>>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] handled >>>>>>> = self.construct_tables() >>>>>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>>>>> construct_tables >>>>>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] handled >>>>>>> = self.handle_table(table) >>>>>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>>>>> handle_table >>>>>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>>>>>> self._get_data_dict() >>>>>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>>>>> _get_data_dict >>>>>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>>>>> data.extend(func()) >>>>>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>>>>> wrapped >>>>>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value = >>>>>>> cache[key] = func(*args, **kwargs) >>>>>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>>>>> line 57, in get_shares_data >>>>>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] >>>>>>> share_nets = manila.share_network_list(self.request) >>>>>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>>>>> share_network_list >>>>>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>>>>>> manilaclient(request).share_networks.list(detailed=detailed, >>>>>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>>>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>>>>> >>>>>> >> Looking at the error here, and the code - it could be that the UI isn't >> able to retrieve the manila service endpoint from the service catalog. If >> this is the case, you must be able to see a "DEBUG" level log in your httpd >> error log with "no share service configured". Do you see it? >> >> As the user you're using on horizon, can you perform "openstack catalog >> list" and check whether the "sharev2" service type exists in that list? >> >> >>> >>>>>>> Please, anyone could help ? >>>>>>> Ignazio >>>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Mon Aug 3 21:58:53 2020 From: thomas.king at gmail.com (Thomas King) Date: Mon, 3 Aug 2020 15:58:53 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: I've been using named physical networks so long, I completely forgot using wildcards! Is this the answer???? https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html#modular-layer-2-ml2-flat-type-configuration-options Tom King On Tue, Jul 28, 2020 at 3:46 PM Thomas King wrote: > Ruslanas has been a tremendous help. To catch up the discussion lists... > 1. I enabled Neutron segments. > 2. I renamed the existing segments for each network so they'll make sense. > 3. I attempted to create a segment for a remote subnet (it is using DHCP > relay) and this was the error that is blocking me. This is where the docs > do not cover: > [root at sea-maas-controller ~(keystone_admin)]# openstack network segment > create --physical-network remote146-30-32 --network-type flat --network > baremetal seg-remote-146-30-32 > BadRequestException: 400: Client Error for url: > http://10.146.30.65:9696/v2.0/segments, Invalid input for operation: > physical_network 'remote146-30-32' unknown for flat provider network. > > I've asked Ruslanas to clarify how their physical networks correspond to > their remote networks. They have a single provider network and multiple > segments tied to multiple physical networks. > > However, if anyone can shine some light on this, I would greatly > appreciate it. How should neutron's configurations accommodate remote > networks<->Neutron segments when I have only one physical network > attachment for provisioning? > > Thanks! > Tom King > > On Wed, Jul 15, 2020 at 3:33 PM Thomas King wrote: > >> That helps a lot, thank you! >> >> "I use only one network..." >> This bit seems to go completely against the Neutron segments >> documentation. When you have access, please let me know if Triple-O is >> using segments or some other method. >> >> I greatly appreciate this, this is a tremendous help. >> >> Tom King >> >> On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis >> wrote: >> >>> Hi Thomas, >>> >>> I have a bit complicated setup from tripleo side :) I use only one >>> network (only ControlPlane). thanks to Harold, he helped to make it work >>> for me. >>> >>> Yes, as written in the tripleo docs for leaf networks, it use the same >>> neutron network, different subnets. so neutron network is ctlplane (I >>> think) and have ctlplane-subnet, remote-provision and remote-KI :)) that >>> generates additional lines in "ip r s" output for routing "foreign" subnets >>> through correct gw, if you would have isolated networks, by vlans and ports >>> this would apply for each subnet different gw... I believe you >>> know/understand that part. >>> >>> remote* subnets have dhcp-relay setup by network team... do not ask >>> details for that. I do not know how to, but can ask :) >>> >>> >>> in undercloud/tripleo i have 2 dhcp servers, one is for introspection, >>> another for provide/cleanup and deployment process. >>> >>> all of those subnets have organization level tagged networks and are >>> tagged on network devices, but they are untagged on provisioning >>> interfaces/ports, as in general pxe should be untagged, but some nic's can >>> do vlan untag on nic/bios level. but who cares!? >>> >>> I just did a brief check on your first post, I think I have simmilar >>> setup to yours :)) I will check in around 12hours :)) more deaply, as will >>> be at work :))) >>> >>> >>> P.S. sorry for wrong terms, I am bad at naming. >>> >>> >>> On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: >>> >>>> Ruslanas, that would be excellent! >>>> >>>> I will reply to you directly for details later unless the maillist >>>> would like the full thread. >>>> >>>> Some preliminary questions: >>>> >>>> - Do you have a separate physical interface for the segment(s) used >>>> for your remote subnets? >>>> The docs state each segment must have a unique physical network >>>> name, which suggests a separate physical interface for each segment unless >>>> I'm misunderstanding something. >>>> - Are your provisioning segments all on the same Neutron network? >>>> - Are you using tagged switchports or access switchports to your >>>> Ironic server(s)? >>>> >>>> Thanks, >>>> Tom King >>>> >>>> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> I have deployed that with tripleO, but now we are recabling and >>>>> redeploying it. So once I have it running I can share my configs, just name >>>>> which you want :) >>>>> >>>>> On Tue, 14 Jul 2020 at 18:40, Thomas King >>>>> wrote: >>>>> >>>>>> I have. That's the Triple-O docs and they don't go through the normal >>>>>> .conf files to explain how it works outside of Triple-O. It has some ideas >>>>>> but no running configurations. >>>>>> >>>>>> Tom King >>>>>> >>>>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>>>>> wrote: >>>>>> >>>>>>> hi, have you checked: >>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>>>> ? >>>>>>> I am following this link. I only have one network, having different >>>>>>> issues tho ;) >>>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Tue Aug 4 04:37:11 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Tue, 4 Aug 2020 10:07:11 +0530 Subject: =?UTF-8?Q?Re=3A_=5Blists=2Eopenstack=2Eorg=E4=BB=A3=E5=8F=91=5DRe=3A_=5BGlance=5D_Proposin?= =?UTF-8?Q?g_Dan_Smith_for_glance_core?= In-Reply-To: <03ece5d405c74b2d9292301c2e3be7b8@inspur.com> References: <8635120d-11d6-136e-2581-40d3d451d1aa@gmail.com> <03ece5d405c74b2d9292301c2e3be7b8@inspur.com> Message-ID: Hi All, After hearing only positive responses, I have added Dan to the Core members list. Welcome aboard Dan. Cheers, Abhishek On Mon, 3 Aug, 2020, 05:44 Brin Zhang(张百林), wrote: > +1 > > > > *发件人:* Jay Bryant [mailto:jungleboyj at gmail.com] > *发送时间:* 2020年7月31日 23:39 > *收件人:* openstack-discuss at lists.openstack.org > *主题:* [lists.openstack.org代发]Re: [Glance] Proposing Dan Smith for glance > core > > > > On 7/31/2020 8:10 AM, Sean McGinnis wrote: > > On 7/30/20 10:25 AM, Abhishek Kekane wrote: > > Hi All, > > I'd like to propose adding Dan Smith to the glance core group. > > > > Dan Smith has contributed to stabilize image import workflow as well as > multiple stores of glance. > > He is also contributing in tempest and nova to set up CI/tempest jobs > around image import and multiple stores. > > > > Being involved on the mailing-list and IRC channels, Dan is always helpful > to the community and here to help. > > Please respond with +1/-1 until 03rd August, 2020 1400 UTC. > > Cheers, > Abhishek > > +1 > > Not a Glance core but definitely +1 from me. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Tue Aug 4 05:02:49 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Tue, 4 Aug 2020 00:02:49 -0500 Subject: [nova] Hyper-V hosts Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814461@gmsxchsvr01.thecreation.com> Hi, I thought I'd look into support of Hyper-V hosts for Windows Server environments, but it looks like the latest cloudbase Windows Hyper-V OpenStack Installer is for Train, and nothing seems to discuss the use of Hyper-V in Windows Server 2019. Has it been abandoned? Is anyone using Hyper-V with OpenStack successfully? One of the reasons we thought we might support it is to provide nested support for VMs with GPUs and/or vGPUs, and thought this would work better than with KVM, specifically with AMD EPYC systems. It seems that when "options kvm-amd nested=1" is used in a modprobe.d config file, Windows machines lock up when started. I think this has been an issue for a while with AMD processors, but thought it was fixed recently (I don't remember where I saw this, though). Would love to hear about any experiences related to Hyper-V and/or nested hypervisor support on AMD EPYC processors. Thanks! Eric From ignaziocassano at gmail.com Tue Aug 4 05:49:48 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 4 Aug 2020 07:49:48 +0200 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: Hello Victoria and Goutham, thank you for your great help. Unfortunately I made I mistake in my ansible playbook for installing manila: it created manila services more times, so some entries in the catalog did not have an endpoint associated. I removed the duplicated service entries where catalog was absent and now it works. Many thanks Ignazio Il giorno mar 4 ago 2020 alle ore 02:53 Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> ha scritto: > In local_settings.py under openstack-dashboard. And then restart the > webserver. > > Did you copy the enable and local files from manila-ui under Horizon's > namespace? Check out > https://docs.openstack.org/manila-ui/latest/install/installation.html > > We can continue debugging tomorrow, we will find out what is going on. > > Cheers, > > V > > > On Mon, Aug 3, 2020, 6:46 PM Ignazio Cassano > wrote: > >> Hello Goutham,tomorrow I will check the catalog. >> Must I enable the debug option in dashboard local_setting or in >> manila.conf? >> Thanks >> Ignazio >> >> >> Il Lun 3 Ago 2020, 23:01 Goutham Pacha Ravi ha >> scritto: >> >>> >>> >>> >>> On Mon, Aug 3, 2020 at 1:31 PM Ignazio Cassano >>> wrote: >>> >>>> I mean I am using dhss false >>>> >>>> Il Lun 3 Ago 2020, 21:41 Ignazio Cassano ha >>>> scritto: >>>> >>>>> PS ps >>>>> Sorry If aI am writing again. >>>>> The command: >>>>> manila list let me to show shares I created with command line. >>>>> The dashboard gives errors I reported in my first email. >>>>> Looking at manila.py line 280 it checks shares under share networks. >>>>> Ignazio >>>>> >>>>> >>>>> Il Lun 3 Ago 2020, 21:34 Ignazio Cassano >>>>> ha scritto: >>>>> >>>>>> PS >>>>>> I followed installation guide under docs.openstack.org. >>>>>> >>>>>> >>>>>> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >>>>>> victoria at vmartinezdelacruz.com> ha scritto: >>>>>> >>>>>>> Hi Ignazio, >>>>>>> >>>>>>> How did you deploy Manila and Manila UI? Can you point me toward the >>>>>>> docs you used? >>>>>>> >>>>>>> Also, which is the specific workflow you are following to reach that >>>>>>> trace? Just opening the dashboard and clicking on the Shares tab? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> V >>>>>>> >>>>>>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano < >>>>>>> ignaziocassano at gmail.com> wrote: >>>>>>> >>>>>>>> Hello, I installed manila on openstack stein and it works by >>>>>>>> command line mat the manila ui does not work and in httpd error log I read: >>>>>>>> >>>>>>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>>>>>> django.request Internal Server Error: /dashboard/project/shares/ >>>>>>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>>>>>> (most recent call last): >>>>>>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>>>>>> 41, in inner >>>>>>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] >>>>>>>> response = get_response(request) >>>>>>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>>>>>> in _get_response >>>>>>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] >>>>>>>> response = self.process_exception_by_middleware(e, request) >>>>>>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>>>>>> in _get_response >>>>>>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] >>>>>>>> response = wrapped_callback(request, *callback_args, **callback_kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>>>>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] return >>>>>>>> view_func(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] return >>>>>>>> view_func(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] return >>>>>>>> view_func(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>>>>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] return >>>>>>>> view_func(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>>>>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] return >>>>>>>> view_func(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>>>>>> in view >>>>>>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] return >>>>>>>> self.dispatch(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>>>>>> in dispatch >>>>>>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] return >>>>>>>> handler(request, *args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>>>>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] >>>>>>>> handled = self.construct_tables() >>>>>>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>>>>>> construct_tables >>>>>>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] >>>>>>>> handled = self.handle_table(table) >>>>>>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>>>>>> handle_table >>>>>>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data = >>>>>>>> self._get_data_dict() >>>>>>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>>>>>> _get_data_dict >>>>>>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>>>>>> data.extend(func()) >>>>>>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>>>>>> wrapped >>>>>>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value >>>>>>>> = cache[key] = func(*args, **kwargs) >>>>>>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>>>>>> line 57, in get_shares_data >>>>>>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] >>>>>>>> share_nets = manila.share_network_list(self.request) >>>>>>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>>>>>> share_network_list >>>>>>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] return >>>>>>>> manilaclient(request).share_networks.list(detailed=detailed, >>>>>>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>>>>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>>>>>> >>>>>>> >>> Looking at the error here, and the code - it could be that the UI isn't >>> able to retrieve the manila service endpoint from the service catalog. If >>> this is the case, you must be able to see a "DEBUG" level log in your httpd >>> error log with "no share service configured". Do you see it? >>> >>> As the user you're using on horizon, can you perform "openstack catalog >>> list" and check whether the "sharev2" service type exists in that list? >>> >>> >>>> >>>>>>>> Please, anyone could help ? >>>>>>>> Ignazio >>>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Tue Aug 4 05:54:49 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 4 Aug 2020 07:54:49 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: Hi, i never had this issue, but did you run the post upgrade data migrations? Fabian Massimo Sgaravatto schrieb am Mo., 3. Aug. 2020, 20:21: > We have just updated a small OpenStack cluster to Train. > Everything seems working, but "cinder-status upgrade check" complains that > services and volumes must have a service UUID [*]. > What does this exactly mean? > > Thanks, Massimo > > [*] > +--------------------------------------------------------------------+ > | Check: Service UUIDs | > | Result: Failure | > | Details: Services and volumes must have a service UUID. Please fix | > | this issue by running Queens online data migrations. | > -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Tue Aug 4 07:14:00 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Tue, 4 Aug 2020 09:14:00 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: Do you mean "su -s /bin/sh -c "cinder-manage db sync" cinder" ? Yes: this was run Cheers, Massimo On Tue, Aug 4, 2020 at 7:54 AM Fabian Zimmermann wrote: > Hi, > > i never had this issue, but did you run the post upgrade data migrations? > > Fabian > > Massimo Sgaravatto schrieb am Mo., 3. Aug. > 2020, 20:21: > >> We have just updated a small OpenStack cluster to Train. >> Everything seems working, but "cinder-status upgrade check" complains >> that services and volumes must have a service UUID [*]. >> What does this exactly mean? >> >> Thanks, Massimo >> >> [*] >> +--------------------------------------------------------------------+ >> | Check: Service UUIDs | >> | Result: Failure | >> | Details: Services and volumes must have a service UUID. Please fix | >> | this issue by running Queens online data migrations. | >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Tue Aug 4 07:20:09 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 4 Aug 2020 09:20:09 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: Hi, No i mean the "online data migrations" https://docs.openstack.org/cinder/rocky/upgrade.html Fabian Massimo Sgaravatto schrieb am Di., 4. Aug. 2020, 09:14: > Do you mean "su -s /bin/sh -c "cinder-manage db sync" cinder" ? > Yes: this was run > > Cheers, Massimo > > On Tue, Aug 4, 2020 at 7:54 AM Fabian Zimmermann > wrote: > >> Hi, >> >> i never had this issue, but did you run the post upgrade data migrations? >> >> Fabian >> >> Massimo Sgaravatto schrieb am Mo., 3. >> Aug. 2020, 20:21: >> >>> We have just updated a small OpenStack cluster to Train. >>> Everything seems working, but "cinder-status upgrade check" complains >>> that services and volumes must have a service UUID [*]. >>> What does this exactly mean? >>> >>> Thanks, Massimo >>> >>> [*] >>> +--------------------------------------------------------------------+ >>> | Check: Service UUIDs | >>> | Result: Failure | >>> | Details: Services and volumes must have a service UUID. Please fix | >>> | this issue by running Queens online data migrations. | >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Tue Aug 4 07:46:35 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Tue, 4 Aug 2020 09:46:35 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: Thanks. I tried but it says there is nothing to migrate: +-----------------------------------------+--------------+-----------+ | Migration | Total Needed | Completed | +-----------------------------------------+--------------+-----------+ | untyped_snapshots_online_data_migration | 0 | 0 | | untyped_volumes_online_data_migration | 0 | 0 | +-----------------------------------------+--------------+-----------+ On Tue, Aug 4, 2020 at 9:20 AM Fabian Zimmermann wrote: > Hi, > > No i mean the "online data migrations" > > https://docs.openstack.org/cinder/rocky/upgrade.html > > Fabian > > Massimo Sgaravatto schrieb am Di., 4. Aug. > 2020, 09:14: > >> Do you mean "su -s /bin/sh -c "cinder-manage db sync" cinder" ? >> Yes: this was run >> >> Cheers, Massimo >> >> On Tue, Aug 4, 2020 at 7:54 AM Fabian Zimmermann >> wrote: >> >>> Hi, >>> >>> i never had this issue, but did you run the post upgrade data migrations? >>> >>> Fabian >>> >>> Massimo Sgaravatto schrieb am Mo., 3. >>> Aug. 2020, 20:21: >>> >>>> We have just updated a small OpenStack cluster to Train. >>>> Everything seems working, but "cinder-status upgrade check" complains >>>> that services and volumes must have a service UUID [*]. >>>> What does this exactly mean? >>>> >>>> Thanks, Massimo >>>> >>>> [*] >>>> +--------------------------------------------------------------------+ >>>> | Check: Service UUIDs | >>>> | Result: Failure | >>>> | Details: Services and volumes must have a service UUID. Please fix | >>>> | this issue by running Queens online data migrations. | >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Aug 4 08:08:19 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 4 Aug 2020 09:08:19 +0100 Subject: [ironic][stable] Include ironic-core in ironic-stable-maint ? In-Reply-To: References: Message-ID: On Mon, 3 Aug 2020 at 21:06, Julia Kreger wrote: > > Greetings awesome humans, > > I have a conundrum, and largely it is over stable branch maintenance. > > In essence, our stable branch approvers are largely down to Dmitry, > Riccardo, and Myself. I think this needs to change and I'd like to > propose that we go ahead and change ironic-stable-maint to just > include ironic-core in order to prevent the bottleneck and conflict > and risk which this presents. > > I strongly believe that our existing cores would all do the right > thing if presented with the question of if a change needed to be > merged. So honestly I'm not concerned by this proposal. Plus, some of > our sub-projects have operated this way for quite some time. > > Thoughts, concerns, worries? > Makes sense to me. We operate this way in Kolla. It might be good to make sure that current cores are all aware of what 'the right thing' is, that it is written down, and that we include it in the core onboarding process. > -Julia > From mark at stackhpc.com Tue Aug 4 08:11:39 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 4 Aug 2020 09:11:39 +0100 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner wrote: > > We are working on it. So far we have 3 open proposals there, but we do not have enough karma to move things along. > Besides these 3 open proposals, we do have more ongoing extensions that have not yet been proposed to the community. It's good to hear you want to help improve cloudkitty, however it sounds like what is required is help with maintaining the project. Is that something you could be involved with? Mark > > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis wrote: >> >> Posting here to raise awareness, and start discussion about next steps. >> >> It appears there is no one working on Cloudkitty anymore. No patches >> have been merged for several months now, including simple bot proposed >> patches. It would appear no one is maintaining this project anymore. >> >> I know there is a need out there for this type of functionality, so >> maybe this will raise awareness and get some attention to it. But >> barring that, I am wondering if we should start the process to retire >> this project. >> >> From a Victoria release perspective, it is milestone-2 week, so we >> should make a decision if any of the Cloudkitty deliverables should be >> included in this release or not. We can certainly force releases of >> whatever is the latest, but I think that is a bit risky since these >> repos have never merged the job template change for victoria and >> therefore are not even testing with Python 3.8. That is an official >> runtime for Victoria, so we run the risk of having issues with the code >> if someone runs under 3.8 but we have not tested to make sure there are >> no problems doing so. >> >> I am hoping this at least starts the discussion. I will not propose any >> release patches to remove anything until we have had a chance to discuss >> the situation. >> >> Sean >> >> > > > -- > Rafael Weingärtner From dev.faz at gmail.com Tue Aug 4 08:17:45 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 4 Aug 2020 10:17:45 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: Hmm, the err msg tells to run the queens version of the tool. Maybe something went wrong, but the db version got incremented? Just guessing. Did you try to find the commit/change that introduced the msg? Maybe it refers to the action required to fix it / or check the db online migrarions scripts what they would/should do. Fabian Massimo Sgaravatto schrieb am Di., 4. Aug. 2020, 09:46: > Thanks. > I tried but it says there is nothing to migrate: > > +-----------------------------------------+--------------+-----------+ > | Migration | Total Needed | Completed | > +-----------------------------------------+--------------+-----------+ > | untyped_snapshots_online_data_migration | 0 | 0 | > | untyped_volumes_online_data_migration | 0 | 0 | > +-----------------------------------------+--------------+-----------+ > > On Tue, Aug 4, 2020 at 9:20 AM Fabian Zimmermann > wrote: > >> Hi, >> >> No i mean the "online data migrations" >> >> https://docs.openstack.org/cinder/rocky/upgrade.html >> >> Fabian >> >> Massimo Sgaravatto schrieb am Di., 4. >> Aug. 2020, 09:14: >> >>> Do you mean "su -s /bin/sh -c "cinder-manage db sync" cinder" ? >>> Yes: this was run >>> >>> Cheers, Massimo >>> >>> On Tue, Aug 4, 2020 at 7:54 AM Fabian Zimmermann >>> wrote: >>> >>>> Hi, >>>> >>>> i never had this issue, but did you run the post upgrade data >>>> migrations? >>>> >>>> Fabian >>>> >>>> Massimo Sgaravatto schrieb am Mo., 3. >>>> Aug. 2020, 20:21: >>>> >>>>> We have just updated a small OpenStack cluster to Train. >>>>> Everything seems working, but "cinder-status upgrade check" complains >>>>> that services and volumes must have a service UUID [*]. >>>>> What does this exactly mean? >>>>> >>>>> Thanks, Massimo >>>>> >>>>> [*] >>>>> +--------------------------------------------------------------------+ >>>>> | Check: Service UUIDs | >>>>> | Result: Failure | >>>>> | Details: Services and volumes must have a service UUID. Please fix | >>>>> | this issue by running Queens online data migrations. | >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Tue Aug 4 08:54:39 2020 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 4 Aug 2020 10:54:39 +0200 Subject: [ironic][stable] Include ironic-core in ironic-stable-maint ? In-Reply-To: References: Message-ID: <01691703-aa8a-c24b-bc8e-7671a55d1d34@openstack.org> Julia Kreger wrote: > [...] > In essence, our stable branch approvers are largely down to Dmitry, > Riccardo, and Myself. I think this needs to change and I'd like to > propose that we go ahead and change ironic-stable-maint to just > include ironic-core in order to prevent the bottleneck and conflict > and risk which this presents. > > I strongly believe that our existing cores would all do the right > thing if presented with the question of if a change needed to be > merged. So honestly I'm not concerned by this proposal. Plus, some of > our sub-projects have operated this way for quite some time. > > Thoughts, concerns, worries? Sounds good to me. Stable branch backport approvals follow different rules from development branch changes, which is why historically we used separate groups -- so that all -core do not need to know the stable policy rules. But today -core groups evolve less quickly and can probably be taught the stable policy, so I'm not too concerned either. Maybe it's a good time to remind them of the stable policy doc though, in particular the "appropriate fixes" section: https://docs.openstack.org/project-team-guide/stable-branches.html Cheers, -- Thierry From jesse at odyssey4.me Tue Aug 4 09:23:30 2020 From: jesse at odyssey4.me (Jesse Pretorius) Date: Tue, 4 Aug 2020 09:23:30 +0000 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> Message-ID: <2231a9649ccef5f1c712ac509fb22cb199f6b88b.camel@odyssey4.me> On Mon, 2020-08-03 at 13:28 -0600, Alex Schultz wrote: On Mon, Aug 3, 2020 at 6:34 AM Bogdan Dobrelya < bdobreli at redhat.com > wrote: On 8/3/20 12:36 PM, Sagi Shnaidman wrote: Hi, Bogdan thanks for raising this up, although I'm not sure I understand what it is the problem with using action plugins. Action plugins are well known official extensions for Ansible, as any other plugins - callback, strategy, inventory etc [1]. It is not any hack or unsupported workaround, it's a known and official feature of Ansible. Why can't we use it? What makes it different from filter, I believe the cases that require the use of those should be justified. For the given example, that manages containers in a loop via calling a module, what the written custom callback plugin buys for us? That brings code to maintain, extra complexity, like handling possible corner cases in async mode, dry-run mode etc. But what is justification aside of looks handy? I disagree that we shouldn't use action plugins or modules. Tasks themselves are expensive at scale. We saw that when we switched away from paunch to container management in pure ansible tasks. This exposed that looping tasks are even more expensive and complex error handling and workflows are better suited for modules or action plugins than a series of tasks. This is not something to be "fixed in ansible". This is the nature of the executor and strategy related interactions. Should everything be converted to modules and plugins? no. Should everything be tasks only? no. It's a balance that must be struck between when a specific set of complex tasks need extra data processing or error handling. Switching to modules or action plugins allows us to unit test our logic. Using tasks do not have such a concept outside of writing complex molecule testing. IMHO it's safer to switch to modules/action plugins than writing task logic. I agree with Alex. Writing complex logic or trying to do error handling in tasks or jinja is not only very slow in execution, but gives us no way to properly test. Using ansible extensions like modules, action plugins, filters, etc gives us something that we can unit test, do better error handling with and therefore provides the abilty to produce a better quality result. While it is true that it does give us more of a downstream burden, our community is well versed in reading python code and testing python code properly. Sometimes it might seem easier to an author to prototype something using ansible/jinja, but if the result is complex then an extension of some kind should be considered as a iteration and it should be unit tested. I'd go as far as to say that if we add module, we should force the requirement for unit testing through some means to ensure good code quality and maintainability. Another benefit to using modules is that the Ansible tasks read more like a sequence of events that need to happen, which is exactly the spirit that Ansible has always advocated. When complex logic is implemented in tasks or in jinja, trying to follow the orchestration sequence becomes a *lot* harder. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Tue Aug 4 09:35:06 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 4 Aug 2020 11:35:06 +0200 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> Message-ID: <385dc8d7-198f-64ce-908f-49ab823ed229@redhat.com> On 8/3/20 9:28 PM, Alex Schultz wrote: > On Mon, Aug 3, 2020 at 6:34 AM Bogdan Dobrelya wrote: >> >> On 8/3/20 12:36 PM, Sagi Shnaidman wrote: >>> Hi, Bogdan >>> >>> thanks for raising this up, although I'm not sure I understand what it >>> is the problem with using action plugins. >>> Action plugins are well known official extensions for Ansible, as any >>> other plugins - callback, strategy, inventory etc [1]. It is not any >>> hack or unsupported workaround, it's a known and official feature of >>> Ansible. Why can't we use it? What makes it different from filter, >> >> I believe the cases that require the use of those should be justified. >> For the given example, that manages containers in a loop via calling a >> module, what the written custom callback plugin buys for us? That brings >> code to maintain, extra complexity, like handling possible corner cases >> in async mode, dry-run mode etc. But what is justification aside of >> looks handy? > > I disagree that we shouldn't use action plugins or modules. Tasks > themselves are expensive at scale. We saw that when we switched away > from paunch to container management in pure ansible tasks. This > exposed that looping tasks are even more expensive and complex error > handling and workflows are better suited for modules or action plugins > than a series of tasks. This is not something to be "fixed in > ansible". This is the nature of the executor and strategy related > interactions. Should everything be converted to modules and plugins? > no. Should everything be tasks only? no. It's a balance that must be > struck between when a specific set of complex tasks need extra data > processing or error handling. Switching to modules or action plugins > allows us to unit test our logic. Using tasks do not have such a I can understand that ansible should not be fixed for some composition tasks what require iterations and have complex logic for its "unit of work". And such ones also should be unit tested indeed. What I do not fully understand though is then what abandoning paunch for its action plugin had bought for us in the end? Paunch was self-contained and had no external dependencies on fast-changing ansible frameworks. There was also no need for paunch to handle the ansible-specific execution strategies and nuances, like "what if that action plugin is called in async or in the check-mode?" Unit tests exited in paunch as well. It was easy to backport changes within a single code base. So, looking back retrospectively, was rewriting paunch as an action plugin a simplification of the deployment framework? Please reply to yourself honestly. It does pretty same things but differently and added external framework. It is now also self-contained action plugin, since traditional tasks cannot be used in loops for this goal because of performance reasons. To summarize, action plugins may be a good solution indeed, but perhaps we should go back and use paunch instead of ansible? Same applies for *some* other tasks? That would also provide a balance, for action plugins, tasks and common sense. > concept outside of writing complex molecule testing. IMHO it's safer > to switch to modules/action plugins than writing task logic. > > IMHO the issue that I see with the switch to Action plugins is the > increased load on the ansible "controller" node during execution. > Modules may be better depending on the task being managed. But I > believe with unit testing, action plugins or modules provide a cleaner > and more testable solution than writing roles consisting only of > tasks. > > > >> >>> lookup, inventory or any other plugin we already use? >>> Action plugins are also used wide in Ansible itself, for example >>> templates plugin is implemented with action plugin [2]. If Ansible can >>> use it, why can't we? I don't think there is something with "fixing" >>> Ansible, it's not a bug, this is a useful extension. >>> What regards the mentioned action plugin for podman containers, it >>> allows to spawn containers remotely while skipping the connection part >>> for every cycle. I'm not sure you can "fix" Ansible not to do that, it's >>> not a bug. We may not see the difference in a few hosts in CI, but it >>> might be very efficient when we deploy on 100+ hosts oro even 1000+ >>> hosts. In order to evaluate this on bigger setups to understand its >>> value we configured both options - to use action plugin or usual module. >>> If better performance of action plugin will be proven, we can switch to >>> use it, if it doesn't make a difference on bigger setups - then I think >>> we can easily switch back to using an usual module. >>> >>> Thanks >>> >>> [1] https://docs.ansible.com/ansible/latest/plugins/plugins.html >>> [2] >>> https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/template.py >>> >>> On Mon, Aug 3, 2020 at 11:19 AM Bogdan Dobrelya >> > wrote: >>> >>> There is a trend of writing action plugins, see [0], for simple things, >>> like just calling a module in a loop. I'm not sure that is the >>> direction >>> TripleO should go. If ansible is inefficient in this sort of tasks >>> without custom python code written, we should fix ansible. Otherwise, >>> what is the ultimate goal of that trend? Is that having only action >>> plugins in roles and playbooks? >>> >>> Please kindly asking the community to stop that, make a step back and >>> reiterate with the taken approach. Thank you. >>> >>> [0] https://review.opendev.org/716108 >>> >>> >>> -- >>> Best regards, >>> Bogdan Dobrelya, >>> Irc #bogdando >>> >>> >>> >>> >>> -- >>> Best regards >>> Sagi Shnaidman >> >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Irc #bogdando >> >> > -- Best regards, Bogdan Dobrelya, Irc #bogdando From sshnaidm at redhat.com Tue Aug 4 09:38:42 2020 From: sshnaidm at redhat.com (Sagi Shnaidman) Date: Tue, 4 Aug 2020 12:38:42 +0300 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: <385dc8d7-198f-64ce-908f-49ab823ed229@redhat.com> References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> <385dc8d7-198f-64ce-908f-49ab823ed229@redhat.com> Message-ID: Hi, Actually this discussion prompted me to investigate more how to optimize containers setup on a large number of hosts. As I saw from action plugin work, it still copies the module file each loop cycle, which is not very efficient behavior. That's why I started work on a podman container module which can start a bunch of containers in one call and accepts a list of containers as an input. In this case the module file will be transferred to the remote host only once and all containers execution will be done by python on the remote host. That way we'll avoid unnecessary establishing connections, copying files, setting permissions etc what happens every time we cycle over data. It will be done only once. I'll send a patch soon for review. Thanks On Tue, Aug 4, 2020 at 12:35 PM Bogdan Dobrelya wrote: > On 8/3/20 9:28 PM, Alex Schultz wrote: > > On Mon, Aug 3, 2020 at 6:34 AM Bogdan Dobrelya > wrote: > >> > >> On 8/3/20 12:36 PM, Sagi Shnaidman wrote: > >>> Hi, Bogdan > >>> > >>> thanks for raising this up, although I'm not sure I understand what it > >>> is the problem with using action plugins. > >>> Action plugins are well known official extensions for Ansible, as any > >>> other plugins - callback, strategy, inventory etc [1]. It is not any > >>> hack or unsupported workaround, it's a known and official feature of > >>> Ansible. Why can't we use it? What makes it different from filter, > >> > >> I believe the cases that require the use of those should be justified. > >> For the given example, that manages containers in a loop via calling a > >> module, what the written custom callback plugin buys for us? That brings > >> code to maintain, extra complexity, like handling possible corner cases > >> in async mode, dry-run mode etc. But what is justification aside of > >> looks handy? > > > > I disagree that we shouldn't use action plugins or modules. Tasks > > themselves are expensive at scale. We saw that when we switched away > > from paunch to container management in pure ansible tasks. This > > exposed that looping tasks are even more expensive and complex error > > handling and workflows are better suited for modules or action plugins > > than a series of tasks. This is not something to be "fixed in > > ansible". This is the nature of the executor and strategy related > > interactions. Should everything be converted to modules and plugins? > > no. Should everything be tasks only? no. It's a balance that must be > > struck between when a specific set of complex tasks need extra data > > processing or error handling. Switching to modules or action plugins > > allows us to unit test our logic. Using tasks do not have such a > > I can understand that ansible should not be fixed for some composition > tasks what require iterations and have complex logic for its "unit of > work". And such ones also should be unit tested indeed. What I do not > fully understand though is then what abandoning paunch for its action > plugin had bought for us in the end? > > Paunch was self-contained and had no external dependencies on > fast-changing ansible frameworks. There was also no need for paunch to > handle the ansible-specific execution strategies and nuances, like "what > if that action plugin is called in async or in the check-mode?" Unit > tests exited in paunch as well. It was easy to backport changes within a > single code base. > > So, looking back retrospectively, was rewriting paunch as an action > plugin a simplification of the deployment framework? Please reply to > yourself honestly. It does pretty same things but differently and added > external framework. It is now also self-contained action plugin, since > traditional tasks cannot be used in loops for this goal because of > performance reasons. > > To summarize, action plugins may be a good solution indeed, but perhaps > we should go back and use paunch instead of ansible? Same applies for > *some* other tasks? That would also provide a balance, for action > plugins, tasks and common sense. > > > concept outside of writing complex molecule testing. IMHO it's safer > > to switch to modules/action plugins than writing task logic. > > > > IMHO the issue that I see with the switch to Action plugins is the > > increased load on the ansible "controller" node during execution. > > Modules may be better depending on the task being managed. But I > > believe with unit testing, action plugins or modules provide a cleaner > > and more testable solution than writing roles consisting only of > > tasks. > > > > > > > >> > >>> lookup, inventory or any other plugin we already use? > >>> Action plugins are also used wide in Ansible itself, for example > >>> templates plugin is implemented with action plugin [2]. If Ansible can > >>> use it, why can't we? I don't think there is something with "fixing" > >>> Ansible, it's not a bug, this is a useful extension. > >>> What regards the mentioned action plugin for podman containers, it > >>> allows to spawn containers remotely while skipping the connection part > >>> for every cycle. I'm not sure you can "fix" Ansible not to do that, > it's > >>> not a bug. We may not see the difference in a few hosts in CI, but it > >>> might be very efficient when we deploy on 100+ hosts oro even 1000+ > >>> hosts. In order to evaluate this on bigger setups to understand its > >>> value we configured both options - to use action plugin or usual > module. > >>> If better performance of action plugin will be proven, we can switch to > >>> use it, if it doesn't make a difference on bigger setups - then I think > >>> we can easily switch back to using an usual module. > >>> > >>> Thanks > >>> > >>> [1] https://docs.ansible.com/ansible/latest/plugins/plugins.html > >>> [2] > >>> > https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/template.py > >>> > >>> On Mon, Aug 3, 2020 at 11:19 AM Bogdan Dobrelya >>> > wrote: > >>> > >>> There is a trend of writing action plugins, see [0], for simple > things, > >>> like just calling a module in a loop. I'm not sure that is the > >>> direction > >>> TripleO should go. If ansible is inefficient in this sort of tasks > >>> without custom python code written, we should fix ansible. > Otherwise, > >>> what is the ultimate goal of that trend? Is that having only > action > >>> plugins in roles and playbooks? > >>> > >>> Please kindly asking the community to stop that, make a step back > and > >>> reiterate with the taken approach. Thank you. > >>> > >>> [0] https://review.opendev.org/716108 > >>> > >>> > >>> -- > >>> Best regards, > >>> Bogdan Dobrelya, > >>> Irc #bogdando > >>> > >>> > >>> > >>> > >>> -- > >>> Best regards > >>> Sagi Shnaidman > >> > >> > >> -- > >> Best regards, > >> Bogdan Dobrelya, > >> Irc #bogdando > >> > >> > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > -- Best regards Sagi Shnaidman -------------- next part -------------- An HTML attachment was scrubbed... URL: From marino.mrc at gmail.com Tue Aug 4 09:55:52 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Tue, 4 Aug 2020 11:55:52 +0200 Subject: [ironic][tripleo][ussuri] Problem with bare metal provisioning and old RAID controllers Message-ID: Hi, I'm trying to install openstack Ussuri on Centos 8 hardware using tripleo. I'm using a relatively old hardware (dell PowerEdge R620) with old RAID controllers, deprecated in RHEL8/Centos8. Here is some basic information: # lspci | grep -i raid 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05) 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2008 [Falcon] (rev 03) I'm able to manually install centos 8 using DUD driver from here -> https://elrepo.org/linux/dud/el8/x86_64/dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso (basically I add inst.dd and I use an usb pendrive with iso). Is there a way to do bare metal provisioning using openstack on this kind of server? At the moment, when I launch "openstack overcloud node introspect --provide controller1" it doesn't recognize disks (local_gb = 0 in properties) and in inspector logs I see: Jun 22 11:12:42 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:42.261 1543 DEBUG root [-] Still waiting for the root device to appear, attempt 1 of 10 wait_for_disks /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:652 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.299 1543 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): udevadm settle execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.357 1543 DEBUG oslo_concurrency.processutils [-] CMD "udevadm settle" returned: 0 in 0.058s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.392 1543 DEBUG ironic_lib.utils [-] Execution completed, command line is "udevadm settle" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.426 1543 DEBUG ironic_lib.utils [-] Command stdout is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.460 1543 DEBUG ironic_lib.utils [-] Command stderr is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.496 1543 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.549 1543 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.647 1543 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" returned: 0 in 0.097s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.683 1543 DEBUG ironic_lib.utils [-] Execution completed, command line is "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.719 1543 DEBUG ironic_lib.utils [-] Command stdout is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: 2018-06-22 11:12:45.755 1543 DEBUG ironic_lib.utils [-] Command stderr is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 Is there a way to solve the issue? For example, can I modify ramdisk and include DUD driver? I tried this guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks but I don't know how to include an ISO instead of an rpm packet as described in the example. Thank you, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Tue Aug 4 10:30:04 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 4 Aug 2020 12:30:04 +0200 Subject: [ironic][tripleo][ussuri] Problem with bare metal provisioning and old RAID controllers In-Reply-To: References: Message-ID: Hi, On Tue, Aug 4, 2020 at 11:58 AM Marco Marino wrote: > Hi, I'm trying to install openstack Ussuri on Centos 8 hardware using > tripleo. I'm using a relatively old hardware (dell PowerEdge R620) with old > RAID controllers, deprecated in RHEL8/Centos8. Here is some basic > information: > # lspci | grep -i raid > 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset > SATA RAID Controller (rev 05) > 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2008 [Falcon] > (rev 03) > > I'm able to manually install centos 8 using DUD driver from here -> > https://elrepo.org/linux/dud/el8/x86_64/dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso > (basically I add inst.dd and I use an usb pendrive with iso). > Is there a way to do bare metal provisioning using openstack on this kind > of server? At the moment, when I launch "openstack overcloud node > introspect --provide controller1" it doesn't recognize disks (local_gb = 0 > in properties) and in inspector logs I see: > Jun 22 11:12:42 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:42.261 1543 DEBUG root [-] Still waiting for the root > device to appear, attempt 1 of 10 wait_for_disks > /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:652 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.299 1543 DEBUG oslo_concurrency.processutils [-] > Running cmd (subprocess): udevadm settle execute > /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.357 1543 DEBUG oslo_concurrency.processutils [-] CMD > "udevadm settle" returned: 0 in 0.058s execute > /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.392 1543 DEBUG ironic_lib.utils [-] Execution > completed, command line is "udevadm settle" execute > /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.426 1543 DEBUG ironic_lib.utils [-] Command stdout is: > "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.460 1543 DEBUG ironic_lib.utils [-] Command stderr is: > "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.496 1543 WARNING root [-] Path /dev/disk/by-path is > inaccessible, /dev/disk/by-path/* version of block device name is > unavailable Cause: [Errno 2] No such file or directory: > '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or > directory: '/dev/disk/by-path' > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.549 1543 DEBUG oslo_concurrency.processutils [-] > Running cmd (subprocess): lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE execute > /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.647 1543 DEBUG oslo_concurrency.processutils [-] CMD > "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" returned: 0 in 0.097s execute > /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.683 1543 DEBUG ironic_lib.utils [-] Execution > completed, command line is "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" > execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.719 1543 DEBUG ironic_lib.utils [-] Command stdout is: > "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 > Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: > 2018-06-22 11:12:45.755 1543 DEBUG ironic_lib.utils [-] Command stderr is: > "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 > > Is there a way to solve the issue? For example, can I modify ramdisk and > include DUD driver? I tried this guide: > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks > > but I don't know how to include an ISO instead of an rpm packet as > described in the example. > Indeed, I don't think you can use ISO as it is, you'll need to figure out what is inside. If it's an RPM (as I assume), you'll need to extract it and install into the ramdisk. If nothing helps, you can try building a ramdisk with CentOS 7, the (very) recent versions of ironic-python-agent-builder allow using Python 3 on CentOS 7. Dmitry > Thank you, > Marco > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marino.mrc at gmail.com Tue Aug 4 10:57:13 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Tue, 4 Aug 2020 12:57:13 +0200 Subject: [ironic][tripleo][ussuri] Problem with bare metal provisioning and old RAID controllers In-Reply-To: References: Message-ID: Here is what I did: # /usr/lib/dracut/skipcpio /home/stack/images/ironic-python-agent.initramfs | zcat | cpio -ivd | pax -r # mount dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso /mnt/ # rpm2cpio /mnt/rpms/x86_64/kmod-megaraid_sas-07.710.50.00-1.el8_2.elrepo.x86_64.rpm | pax -r # find . 2>/dev/null | cpio --quiet -c -o | gzip -8 > /home/stack/images/ironic-python-agent.initramfs # chown stack: /home/stack/images/ironic-python-agent.initramfs (undercloud) [stack at undercloud ~]$ openstack overcloud image upload --update-existing --image-path /home/stack/images/ At this point I checked that agent.ramdisk in /var/lib/ironic/httpboot has an update timestamp Then (undercloud) [stack at undercloud ~]$ openstack overcloud node introspect --provide controller2 /usr/lib64/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ return f(*args, **kwds) PLAY [Baremetal Introspection for multiple Ironic Nodes] *********************** 2020-08-04 12:32:26.684368 | ecf4bbd2-e605-20dd-3da9-000000000008 | TASK | Check for required inputs 2020-08-04 12:32:26.739797 | ecf4bbd2-e605-20dd-3da9-000000000008 | SKIPPED | Check for required inputs | localhost | item=node_uuids 2020-08-04 12:32:26.746684 | ecf4bbd2-e605-20dd-3da9-00000000000a | TASK | Set node_uuids_intro fact [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin (): maximum recursion depth exceeded while calling a Python object 2020-08-04 12:32:26.828985 | ecf4bbd2-e605-20dd-3da9-00000000000a | OK | Set node_uuids_intro fact | localhost 2020-08-04 12:32:26.834281 | ecf4bbd2-e605-20dd-3da9-00000000000c | TASK | Notice 2020-08-04 12:32:26.911106 | ecf4bbd2-e605-20dd-3da9-00000000000c | SKIPPED | Notice | localhost 2020-08-04 12:32:26.916344 | ecf4bbd2-e605-20dd-3da9-00000000000e | TASK | Set concurrency fact 2020-08-04 12:32:26.994087 | ecf4bbd2-e605-20dd-3da9-00000000000e | OK | Set concurrency fact | localhost 2020-08-04 12:32:27.005932 | ecf4bbd2-e605-20dd-3da9-000000000010 | TASK | Check if validation enabled 2020-08-04 12:32:27.116425 | ecf4bbd2-e605-20dd-3da9-000000000010 | SKIPPED | Check if validation enabled | localhost 2020-08-04 12:32:27.129120 | ecf4bbd2-e605-20dd-3da9-000000000011 | TASK | Run Validations 2020-08-04 12:32:27.239850 | ecf4bbd2-e605-20dd-3da9-000000000011 | SKIPPED | Run Validations | localhost 2020-08-04 12:32:27.251796 | ecf4bbd2-e605-20dd-3da9-000000000012 | TASK | Fail if validations are disabled 2020-08-04 12:32:27.362050 | ecf4bbd2-e605-20dd-3da9-000000000012 | SKIPPED | Fail if validations are disabled | localhost 2020-08-04 12:32:27.373947 | ecf4bbd2-e605-20dd-3da9-000000000014 | TASK | Start baremetal introspection 2020-08-04 12:48:19.944028 | ecf4bbd2-e605-20dd-3da9-000000000014 | CHANGED | Start baremetal introspection | localhost 2020-08-04 12:48:19.966517 | ecf4bbd2-e605-20dd-3da9-000000000015 | TASK | Nodes that passed introspection 2020-08-04 12:48:20.130913 | ecf4bbd2-e605-20dd-3da9-000000000015 | OK | Nodes that passed introspection | localhost | result={ "changed": false, "msg": " 00c5e81b-1e5d-442b-b64f-597a604051f7" } 2020-08-04 12:48:20.142919 | ecf4bbd2-e605-20dd-3da9-000000000016 | TASK | Nodes that failed introspection 2020-08-04 12:48:20.305004 | ecf4bbd2-e605-20dd-3da9-000000000016 | OK | Nodes that failed introspection | localhost | result={ "changed": false, "failed_when_result": false, "msg": " All nodes completed introspection successfully!" } 2020-08-04 12:48:20.316860 | ecf4bbd2-e605-20dd-3da9-000000000017 | TASK | Node introspection failed and no results are provided 2020-08-04 12:48:20.427675 | ecf4bbd2-e605-20dd-3da9-000000000017 | SKIPPED | Node introspection failed and no results are provided | localhost PLAY RECAP ********************************************************************* localhost : ok=5 changed=1 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 [WARNING]: Failure using method (v2_playbook_on_stats) in callback plugin (): _output() missing 1 required positional argument: 'color' Successfully introspected nodes: ['controller2'] Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", line 340, in prepare_command cmdline_args = self.loader.load_file('args', string_types, encoding=None) File "/usr/lib/python3.6/site-packages/ansible_runner/loader.py", line 164, in load_file contents = parsed_data = self.get_contents(path) File "/usr/lib/python3.6/site-packages/ansible_runner/loader.py", line 98, in get_contents raise ConfigurationError('specified path does not exist %s' % path) ansible_runner.exceptions.ConfigurationError: specified path does not exist /tmp/tripleop89yr8i8/args During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 34, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 187, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v2/overcloud_node.py", line 210, in take_action node_uuids=parsed_args.node_uuids, File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/baremetal.py", line 134, in provide 'node_uuids': node_uuids File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 659, in run_ansible_playbook runner_config.prepare() File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", line 174, in prepare self.prepare_command() File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", line 346, in prepare_command self.command = self.generate_ansible_command() File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", line 415, in generate_ansible_command v = 'v' * self.verbosity TypeError: can't multiply sequence by non-int of type 'ClientManager' can't multiply sequence by non-int of type 'ClientManager' (undercloud) [stack at undercloud ~]$ and (undercloud) [stack at undercloud ~]$ openstack baremetal node show controller2 .... | properties | {'local_gb': '0', 'cpus': '24', 'cpu_arch': 'x86_64', 'memory_mb': '32768', 'capabilities': 'cpu_vt:true,cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,cpu_txt:true'} It seems that megaraid driver is correctly inserted in ramdisk: # lsinitrd /var/lib/ironic/httpboot/agent.ramdisk | grep megaraid /bin/lsinitrd: line 276: warning: command substitution: ignored null byte in input -rw-r--r-- 1 root root 50 Apr 28 21:55 etc/depmod.d/kmod-megaraid_sas.conf drwxr-xr-x 2 root root 0 Aug 4 12:13 usr/lib/modules/4.18.0-193.6.3.el8_2.x86_64/kernel/drivers/scsi/megaraid -rw-r--r-- 1 root root 68240 Aug 4 12:13 usr/lib/modules/4.18.0-193.6.3.el8_2.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko.xz drwxr-xr-x 2 root root 0 Apr 28 21:55 usr/lib/modules/4.18.0-193.el8.x86_64/extra/megaraid_sas -rw-r--r-- 1 root root 309505 Apr 28 21:55 usr/lib/modules/4.18.0-193.el8.x86_64/extra/megaraid_sas/megaraid_sas.ko drwxr-xr-x 2 root root 0 Apr 28 21:55 usr/share/doc/kmod-megaraid_sas-07.710.50.00 -rw-r--r-- 1 root root 18092 Apr 28 21:55 usr/share/doc/kmod-megaraid_sas-07.710.50.00/GPL-v2.0.txt -rw-r--r-- 1 root root 1152 Apr 28 21:55 usr/share/doc/kmod-megaraid_sas-07.710.50.00/greylist.txt If the solution is to use a Centos7 ramdisk, please can you give me some hint? I have no idea on how to build a new ramdisk from scratch Thank you Il giorno mar 4 ago 2020 alle ore 12:33 Dmitry Tantsur ha scritto: > Hi, > > On Tue, Aug 4, 2020 at 11:58 AM Marco Marino wrote: > >> Hi, I'm trying to install openstack Ussuri on Centos 8 hardware using >> tripleo. I'm using a relatively old hardware (dell PowerEdge R620) with old >> RAID controllers, deprecated in RHEL8/Centos8. Here is some basic >> information: >> # lspci | grep -i raid >> 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset >> SATA RAID Controller (rev 05) >> 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2008 [Falcon] >> (rev 03) >> >> I'm able to manually install centos 8 using DUD driver from here -> >> https://elrepo.org/linux/dud/el8/x86_64/dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso >> (basically I add inst.dd and I use an usb pendrive with iso). >> Is there a way to do bare metal provisioning using openstack on this kind >> of server? At the moment, when I launch "openstack overcloud node >> introspect --provide controller1" it doesn't recognize disks (local_gb = 0 >> in properties) and in inspector logs I see: >> Jun 22 11:12:42 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:42.261 1543 DEBUG root [-] Still waiting for the root >> device to appear, attempt 1 of 10 wait_for_disks >> /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:652 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.299 1543 DEBUG oslo_concurrency.processutils [-] >> Running cmd (subprocess): udevadm settle execute >> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.357 1543 DEBUG oslo_concurrency.processutils [-] CMD >> "udevadm settle" returned: 0 in 0.058s execute >> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.392 1543 DEBUG ironic_lib.utils [-] Execution >> completed, command line is "udevadm settle" execute >> /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.426 1543 DEBUG ironic_lib.utils [-] Command stdout is: >> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.460 1543 DEBUG ironic_lib.utils [-] Command stderr is: >> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.496 1543 WARNING root [-] Path /dev/disk/by-path is >> inaccessible, /dev/disk/by-path/* version of block device name is >> unavailable Cause: [Errno 2] No such file or directory: >> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >> directory: '/dev/disk/by-path' >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.549 1543 DEBUG oslo_concurrency.processutils [-] >> Running cmd (subprocess): lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE execute >> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.647 1543 DEBUG oslo_concurrency.processutils [-] CMD >> "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" returned: 0 in 0.097s execute >> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.683 1543 DEBUG ironic_lib.utils [-] Execution >> completed, command line is "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" >> execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.719 1543 DEBUG ironic_lib.utils [-] Command stdout is: >> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 >> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >> 2018-06-22 11:12:45.755 1543 DEBUG ironic_lib.utils [-] Command stderr is: >> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 >> >> Is there a way to solve the issue? For example, can I modify ramdisk and >> include DUD driver? I tried this guide: >> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks >> >> but I don't know how to include an ISO instead of an rpm packet as >> described in the example. >> > > Indeed, I don't think you can use ISO as it is, you'll need to figure out > what is inside. If it's an RPM (as I assume), you'll need to extract it and > install into the ramdisk. > > If nothing helps, you can try building a ramdisk with CentOS 7, the (very) > recent versions of ironic-python-agent-builder allow using Python 3 on > CentOS 7. > > Dmitry > > >> Thank you, >> Marco >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From victoria at vmartinezdelacruz.com Tue Aug 4 11:43:05 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Tue, 4 Aug 2020 08:43:05 -0300 Subject: [openstack][stein][manila-ui] error In-Reply-To: References: Message-ID: Glad to hear it is working ok now! Cheers, V On Tue, Aug 4, 2020 at 2:50 AM Ignazio Cassano wrote: > Hello Victoria and Goutham, thank you for your great help. > Unfortunately I made I mistake in my ansible playbook for installing > manila: it created manila services more times, so some entries in the > catalog did not have an endpoint associated. > I removed the duplicated service entries where catalog was absent and now > it works. > Many thanks > Ignazio > > Il giorno mar 4 ago 2020 alle ore 02:53 Victoria Martínez de la Cruz < > victoria at vmartinezdelacruz.com> ha scritto: > >> In local_settings.py under openstack-dashboard. And then restart the >> webserver. >> >> Did you copy the enable and local files from manila-ui under Horizon's >> namespace? Check out >> https://docs.openstack.org/manila-ui/latest/install/installation.html >> >> We can continue debugging tomorrow, we will find out what is going on. >> >> Cheers, >> >> V >> >> >> On Mon, Aug 3, 2020, 6:46 PM Ignazio Cassano >> wrote: >> >>> Hello Goutham,tomorrow I will check the catalog. >>> Must I enable the debug option in dashboard local_setting or in >>> manila.conf? >>> Thanks >>> Ignazio >>> >>> >>> Il Lun 3 Ago 2020, 23:01 Goutham Pacha Ravi ha >>> scritto: >>> >>>> >>>> >>>> >>>> On Mon, Aug 3, 2020 at 1:31 PM Ignazio Cassano < >>>> ignaziocassano at gmail.com> wrote: >>>> >>>>> I mean I am using dhss false >>>>> >>>>> Il Lun 3 Ago 2020, 21:41 Ignazio Cassano >>>>> ha scritto: >>>>> >>>>>> PS ps >>>>>> Sorry If aI am writing again. >>>>>> The command: >>>>>> manila list let me to show shares I created with command line. >>>>>> The dashboard gives errors I reported in my first email. >>>>>> Looking at manila.py line 280 it checks shares under share networks. >>>>>> Ignazio >>>>>> >>>>>> >>>>>> Il Lun 3 Ago 2020, 21:34 Ignazio Cassano >>>>>> ha scritto: >>>>>> >>>>>>> PS >>>>>>> I followed installation guide under docs.openstack.org. >>>>>>> >>>>>>> >>>>>>> Il Lun 3 Ago 2020, 21:21 Victoria Martínez de la Cruz < >>>>>>> victoria at vmartinezdelacruz.com> ha scritto: >>>>>>> >>>>>>>> Hi Ignazio, >>>>>>>> >>>>>>>> How did you deploy Manila and Manila UI? Can you point me toward >>>>>>>> the docs you used? >>>>>>>> >>>>>>>> Also, which is the specific workflow you are following to reach >>>>>>>> that trace? Just opening the dashboard and clicking on the Shares tab? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> V >>>>>>>> >>>>>>>> On Mon, Aug 3, 2020 at 4:55 AM Ignazio Cassano < >>>>>>>> ignaziocassano at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello, I installed manila on openstack stein and it works by >>>>>>>>> command line mat the manila ui does not work and in httpd error log I read: >>>>>>>>> >>>>>>>>> [Mon Aug 03 07:45:26.697408 2020] [:error] [pid 3506291] ERROR >>>>>>>>> django.request Internal Server Error: /dashboard/project/shares/ >>>>>>>>> [Mon Aug 03 07:45:26.697437 2020] [:error] [pid 3506291] Traceback >>>>>>>>> (most recent call last): >>>>>>>>> [Mon Aug 03 07:45:26.697442 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line >>>>>>>>> 41, in inner >>>>>>>>> [Mon Aug 03 07:45:26.697446 2020] [:error] [pid 3506291] >>>>>>>>> response = get_response(request) >>>>>>>>> [Mon Aug 03 07:45:26.697450 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, >>>>>>>>> in _get_response >>>>>>>>> [Mon Aug 03 07:45:26.697453 2020] [:error] [pid 3506291] >>>>>>>>> response = self.process_exception_by_middleware(e, request) >>>>>>>>> [Mon Aug 03 07:45:26.697466 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, >>>>>>>>> in _get_response >>>>>>>>> [Mon Aug 03 07:45:26.697471 2020] [:error] [pid 3506291] >>>>>>>>> response = wrapped_callback(request, *callback_args, **callback_kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697475 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 52, in dec >>>>>>>>> [Mon Aug 03 07:45:26.697479 2020] [:error] [pid 3506291] >>>>>>>>> return view_func(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697482 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>>>> [Mon Aug 03 07:45:26.697485 2020] [:error] [pid 3506291] >>>>>>>>> return view_func(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697489 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 36, in dec >>>>>>>>> [Mon Aug 03 07:45:26.697492 2020] [:error] [pid 3506291] >>>>>>>>> return view_func(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697496 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 113, in dec >>>>>>>>> [Mon Aug 03 07:45:26.697499 2020] [:error] [pid 3506291] >>>>>>>>> return view_func(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697502 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/decorators.py", line 84, in dec >>>>>>>>> [Mon Aug 03 07:45:26.697506 2020] [:error] [pid 3506291] >>>>>>>>> return view_func(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697509 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 68, >>>>>>>>> in view >>>>>>>>> [Mon Aug 03 07:45:26.697513 2020] [:error] [pid 3506291] >>>>>>>>> return self.dispatch(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697516 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 88, >>>>>>>>> in dispatch >>>>>>>>> [Mon Aug 03 07:45:26.697520 2020] [:error] [pid 3506291] >>>>>>>>> return handler(request, *args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697523 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 223, in get >>>>>>>>> [Mon Aug 03 07:45:26.697526 2020] [:error] [pid 3506291] >>>>>>>>> handled = self.construct_tables() >>>>>>>>> [Mon Aug 03 07:45:26.697530 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 214, in >>>>>>>>> construct_tables >>>>>>>>> [Mon Aug 03 07:45:26.697533 2020] [:error] [pid 3506291] >>>>>>>>> handled = self.handle_table(table) >>>>>>>>> [Mon Aug 03 07:45:26.697537 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 123, in >>>>>>>>> handle_table >>>>>>>>> [Mon Aug 03 07:45:26.697540 2020] [:error] [pid 3506291] data >>>>>>>>> = self._get_data_dict() >>>>>>>>> [Mon Aug 03 07:45:26.697544 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/tables/views.py", line 43, in >>>>>>>>> _get_data_dict >>>>>>>>> [Mon Aug 03 07:45:26.697547 2020] [:error] [pid 3506291] >>>>>>>>> data.extend(func()) >>>>>>>>> [Mon Aug 03 07:45:26.697550 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/horizon/utils/memoized.py", line 109, in >>>>>>>>> wrapped >>>>>>>>> [Mon Aug 03 07:45:26.697554 2020] [:error] [pid 3506291] value >>>>>>>>> = cache[key] = func(*args, **kwargs) >>>>>>>>> [Mon Aug 03 07:45:26.697557 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/dashboards/project/shares/views.py", >>>>>>>>> line 57, in get_shares_data >>>>>>>>> [Mon Aug 03 07:45:26.697561 2020] [:error] [pid 3506291] >>>>>>>>> share_nets = manila.share_network_list(self.request) >>>>>>>>> [Mon Aug 03 07:45:26.697564 2020] [:error] [pid 3506291] File >>>>>>>>> "/usr/lib/python2.7/site-packages/manila_ui/api/manila.py", line 280, in >>>>>>>>> share_network_list >>>>>>>>> [Mon Aug 03 07:45:26.697568 2020] [:error] [pid 3506291] >>>>>>>>> return manilaclient(request).share_networks.list(detailed=detailed, >>>>>>>>> [Mon Aug 03 07:45:26.697571 2020] [:error] [pid 3506291] >>>>>>>>> AttributeError: 'NoneType' object has no attribute 'share_networks' >>>>>>>>> >>>>>>>> >>>> Looking at the error here, and the code - it could be that the UI isn't >>>> able to retrieve the manila service endpoint from the service catalog. If >>>> this is the case, you must be able to see a "DEBUG" level log in your httpd >>>> error log with "no share service configured". Do you see it? >>>> >>>> As the user you're using on horizon, can you perform "openstack catalog >>>> list" and check whether the "sharev2" service type exists in that list? >>>> >>>> >>>>> >>>>>>>>> Please, anyone could help ? >>>>>>>>> Ignazio >>>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Tue Aug 4 08:37:08 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Tue, 4 Aug 2020 16:37:08 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200730112930.6f4c5762@x1.home> References: <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200729131255.68730f68@x1.home> <20200730034104.GB32327@joy-OptiPlex-7040> <20200730112930.6f4c5762@x1.home> Message-ID: <20200804083708.GA30485@joy-OptiPlex-7040> > > yes, include a device_api field is better. > > for mdev, "device_type=vfio-mdev", is it right? > > No, vfio-mdev is not a device API, it's the driver that attaches to the > mdev bus device to expose it through vfio. The device_api exposes the > actual interface of the vfio device, it's also vfio-pci for typical > mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc... See > VFIO_DEVICE_API_PCI_STRING and friends. > ok. got it. > > > > > device_id=8086591d > > > > > > Is device_id interpreted relative to device_type? How does this > > > relate to mdev_type? If we have an mdev_type, doesn't that fully > > > defined the software API? > > > > > it's parent pci id for mdev actually. > > If we need to specify the parent PCI ID then something is fundamentally > wrong with the mdev_type. The mdev_type should define a unique, > software compatible interface, regardless of the parent device IDs. If > a i915-GVTg_V5_2 means different things based on the parent device IDs, > then then different mdev_types should be reported for those parent > devices. > hmm, then do we allow vendor specific fields? or is it a must that a vendor specific field should have corresponding vendor attribute? another thing is that the definition of mdev_type in GVT only corresponds to vGPU computing ability currently, e.g. i915-GVTg_V5_2, is 1/2 of a gen9 IGD, i915-GVTg_V4_2 is 1/2 of a gen8 IGD. It is too coarse-grained to live migration compatibility. Do you think we need to update GVT's definition of mdev_type? And is there any guide in mdev_type definition? > > > > > mdev_type=i915-GVTg_V5_2 > > > > > > And how are non-mdev devices represented? > > > > > non-mdev can opt to not include this field, or as you said below, a > > vendor signature. > > > > > > > aggregator=1 > > > > > pv_mode="none+ppgtt+context" > > > > > > These are meaningless vendor specific matches afaict. > > > > > yes, pv_mode and aggregator are vendor specific fields. > > but they are important to decide whether two devices are compatible. > > pv_mode means whether a vGPU supports guest paravirtualized api. > > "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or > > use context mode pv. > > > > > > > interface_version=3 > > > > > > Not much granularity here, I prefer Sean's previous > > > .[.bugfix] scheme. > > > > > yes, .[.bugfix] scheme may be better, but I'm not sure if > > it works for a complicated scenario. > > e.g for pv_mode, > > (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, > > (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, > > indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. > > (3) later, pv_mode=context is also supported, > > pv_mode="none+ppgtt+context", so it's 0.2.0. > > > > But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to > > name its version? "none+ppgtt" (0.1.0) is not compatible to > > "none+context", but "none+ppgtt+context" (0.2.0) is compatible to > > "none+context". > > If pv_mode=ppgtt is removed, then the compatible versions would be > 0.0.0 or 1.0.0, ie. the major version would be incremented due to > feature removal. > > > Maintain such scheme is painful to vendor driver. > > Migration compatibility is painful, there's no way around that. I > think the version scheme is an attempt to push some of that low level > burden on the vendor driver, otherwise the management tools need to > work on an ever growing matrix of vendor specific features which is > going to become unwieldy and is largely meaningless outside of the > vendor driver. Instead, the vendor driver can make strategic decisions > about where to continue to maintain a support burden and make explicit > decisions to maintain or break compatibility. The version scheme is a > simplification and abstraction of vendor driver features in order to > create a small, logical compatibility matrix. Compromises necessarily > need to be made for that to occur. > ok. got it. > > > > > COMPATIBLE: > > > > > device_type=pci > > > > > device_id=8086591d > > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > this mixed notation will be hard to parse so i would avoid that. > > > > > > Some background, Intel has been proposing aggregation as a solution to > > > how we scale mdev devices when hardware exposes large numbers of > > > assignable objects that can be composed in essentially arbitrary ways. > > > So for instance, if we have a workqueue (wq), we might have an mdev > > > type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a > > > discrete mdev type for each of those, so they want to define a base > > > type which is composable to other types via this aggregation. This is > > > what this substitution and tagging is attempting to accomplish. So > > > imagine this set of values for cases where it's not practical to unroll > > > the values for N discrete types. > > > > > > > > aggregator={val1}/2 > > > > > > So the {val1} above would be substituted here, though an aggregation > > > factor of 1/2 is a head scratcher... > > > > > > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > > > I'm lost on this one though. I think maybe it's indicating that it's > > > compatible with any of these, so do we need to list it? Couldn't this > > > be handled by Sean's version proposal where the minor version > > > represents feature compatibility? > > yes, it's indicating that it's compatible with any of these. > > Sean's version proposal may also work, but it would be painful for > > vendor driver to maintain the versions when multiple similar features > > are involved. > > This is something vendor drivers need to consider when adding and > removing features. > > > > > > interface_version={val3:int:2,3} > > > > > > What does this turn into in a few years, 2,7,12,23,75,96,... > > > > > is a range better? > > I was really trying to point out that sparseness becomes an issue if > the vendor driver is largely disconnected from how their feature > addition and deprecation affects migration support. Thanks, > ok. we'll use the x.y.z scheme then. Thanks Yan From moreira.belmiro.email.lists at gmail.com Tue Aug 4 08:55:05 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 4 Aug 2020 10:55:05 +0200 Subject: [TC] [PTG] Victoria vPTG Summary of Conversations and Action Items In-Reply-To: References: Message-ID: Hi everyone, the problem described in the "OpenStack User-facing APIs" is something that we face daily in our deployment. Different CLIs for different operations. I'm really interested in driving this action item. Belmiro On Fri, Jun 12, 2020 at 9:38 PM Kendall Nelson wrote: > Hello Everyone! > > I hope you all had a productive and enjoyable PTG! While it’s still > reasonably fresh, I wanted to take a moment to summarize discussions and > actions that came out of TC discussions. > > If there is a particular action item you are interested in taking, please > reply on this thread! > > For the long version, check out the etherpad from the PTG[1]. > > Tuesday > > ====== > > Ussuri Retrospective > > ---------------------------- > > As usual we accomplished a lot. Some of the things we accomplished were > around enumerating operating systems per release (again), removing python2 > support, and adding the ideas repository. Towards the end of the release, > we had a lot of discussions around what to do with leaderless projects, the > role of PTLs, and what to do with projects that were missing PTL candidates > for the next release. We discussed office hours, their history and reason > for existence, and clarified how we can strengthen communication amongst > ourselves, the projects, and the larger community. > > TC Onboarding > > -------------------- > > It was brought up that those elected most recently (and even new members > the election before) felt like there wasn’t enough onboarding into the TC. > Through discussion about what we can do to better support returning members > is to better document the daily, weekly and monthly tasks TC members are > supposed to be doing. Kendall Nelson proposed a patch to start adding more > detail to a guide for TC members already[2]. It was also proposed that we > have a sort of mentorship or shadow program for people interested in > joining the TC or new TC members by more experienced TC members. The > discussion about the shadow/mentorship program is to be continued. > > TC/UC Merge > > ------------------ > > Thierry gave an update on the merge of the committees. The simplified > version is that the current proposal is that UC members are picked from TC > members, the UC operates within the TC, and that we are already setup for > this given the number of TC members that have AUC status. None of this > requires a by-laws change. One next step that has already begun is the > merging of the openstack-users ML into openstack-discuss ML. Other next > steps are to decide when to do the actual transition (disbanding the > separate UC, probably at the next election?) and when to setup AUC’s to be > defined as extra-ATC’s to be included in the electorate for elections. For > more detail, check out the openstack-discuss ML thread[3]. > > Wednesday > > ========= > > Help Wanted List > > ----------------------- > > We settled on a format for the job postings and have several on the list. > We talked about how often we want to look through, update or add to it. The > proposal is to do this yearly. We need to continue pushing on the board to > dedicate contributors at their companies to work on these items, and get > them to understand that it's an investment that will take longer than a > year in a lot of cases; interns are great, but not enough. > > TC Position on Foundation Member Community Contributions > > > ---------------------------------------------------------------------------------- > > The discussion started with a state of things today - the expectations of > platinum members, the benefits the members get being on the board and why > they should donate contributor resources for these benefits, etc. A variety > of proposals were made: either enforce or remove the minimum contribution > level, give gold members the chance to have increased visibility (perhaps > giving them some of the platinum member advantages) if they supplement > their monetary contributions with contributor contributions, etc. The > #ACTION that was decided was for Mohammed to take these ideas to the board > and see what they think. > > OpenStack User-facing APIs > > -------------------------------------- > > Users are confused about the state of the user facing API’s; they’ve been > told to use the OpenStackClient(OSC) but upon use, they discover that there > are features missing that exist in the python-*clients. Partial > implementation in the OSC is worse than if the service only used their > specific CLI. Members of the OpenStackSDK joined discussions and explained > that many of the barriers that projects used to have behind implementing > certain commands have been resolved. The proposal is to create a pop up > team and that they start with fully migrating Nova, documenting the process > and collecting any other unresolved blocking issues with the hope that one > day we can set the migration of the remaining projects as a community goal. > Supplementally, a new idea was proposed- enforcing new functionality to > services is only added to the SDK (and optionally the OSC) and not the > project’s specific CLI to stop increasing the disparity between the two. > The #ACTION here is to start the pop up team, if you are interested, please > reply! Additionally, if you disagree with this kind of enforcement, please > contact the TC as soon as possible and explain your concerns. > > PTL Role in OpenStack today & Leaderless Projects > > --------------------------------------------------------------------- > > This was a veeeeeeeerrrry long conversation that went in circles a few > times. The very short version is that we, the TC, are willing to let > project teams decide for themselves if they want to have a more > deconstructed kind of PTL role by breaking it into someone responsible for > releases and someone responsible for security issues. This new format also > comes with setting the expectation that for things like project updates and > signing up for PTG time, if someone on the team doesn’t actively take that > on, the default assumption is that the project won’t participate. The > #ACTION we need someone to take on is to write a resolution about how this > will work and how it can be done. Ideally, this would be done before the > next technical election, so that teams can choose it at that point. If you > are interested in taking on the writing of this resolution, please speak up! > > Cross Project Work > > ------------------------- > > -Pop Up Teams- > > The two teams we have right now are Encryption and Secure Consistent > Policy Groups. Both are making slow progress and will continue. > > > > -Reducing Community Goals Per Cycle- > > Historically we have had two goals per cycle, but for smaller teams this > can be a HUGE lift. The #ACTION is to clearly outline the documentation for > the goal proposal and selection process to clarify that selecting only one > goal is fine. No one has claimed this action item yet. > > -Victoria Goal Finalization- > > Currently, we have three proposals and one accepted goal. If we are going > to select a second goal, it needs to be done ASAP as Victoria development > has already begun. All TC members should review the last proposal > requesting selection[4]. > > -Wallaby Cycle Goal Discussion Kick Off- > > Firstly, there is a #ACTION that one or two TC members are needed to guide > the W goal selection. If you are interested, please reply to this thread! > There were a few proposed goals for VIctoria that didn’t make it that could > be the starting point for W discussions, in particular, the rootwrap goal > which would be good for operators. The OpenStackCLI might be another goal > to propose for Wallaby. > > Detecting Unmaintained Projects Early > > --------------------------------------------------- > > The TC liaisons program had been created a few releases ago, but the > initial load on TC members was large. We discussed bringing this program > back and making the project health checks happen twice a release, either > the start or end of the release and once in the middle. TC liaisons will > look at previously proposed releases, release activity of the team, the > state of tempest plugins, if regular meetings are happening, if there are > patches in progress and how busy the project’s IRC channel is to make a > determination. Since more than one liaison will be assigned to each > project, those liaisons can divvy up the work how they see fit. The other > aspect that still needs to be decided is where the health checks will be > recorded- in a wiki? In a meeting and meeting logs? That decision is still > to be continued. The current #ACTION currently unassigned is that we need > to assign liaisons for the Victoria cycle and decide when to do the first > health check. > > Friday > > ===== > > Reducing Systems and Friction to Drive Change > > ---------------------------------------------------------------- > > This was another conversation that went in circles a bit before realizing > that we should make a list of the more specific problems we want to address > and then brainstorm solutions for them. The list we created (including > things already being worked) are as follows: > > - > > TC separate from UC (solution in progress) > - > > Stable releases being approved by a separate team (solution in > progress) > - > > Making repository creation faster (especially for established project > teams) > - > > Create a process blueprint for project team mergers > - > > Requirements Team being one person > - > > Stable Team > - > > Consolidate the agent experience > - > > Figure out how to improve project <--> openstack client/sdk > interaction. > > If you feel compelled to pick one of these things up and start proposing > solutions or add to the list, please do! > > Monitoring in OpenStack (Ceilometer + Telemetry + Gnocchi State) > > > ----------------------------------------------------------------------------------------- > > This conversation is also ongoing, but essentially we talked about the > state of things right now- largely they are not well maintained and there > is added complexity with Ceilometers being partially dependent on Gnocchi. > There are a couple of ideas to look into like using oslo.metrics for the > interface between all the tools or using Ceilometer without Gnocchi if we > can clean up those dependencies. No specific action items here, just please > share your thoughts if you have them. > > Ideas Repo Next Steps > > ------------------------------- > > Out of the Ussuri retrospective, it was brought up that we probably needed > to talk a little more about what we wanted for this repo. Essentially we > just want it to be a place to collect ideas into without worrying about the > how. It should be a place to document ideas we have had (old and new) and > keep all the discussion in one place as opposed to historic email threads, > meetings logs, other IRC logs, etc. We decided it would be good to > periodically go through this repo, likely as a forum session at a summit to > see if there is any updating that could happen or promotion of ideas to > community goals, etc. > > ‘tc:approved-release’ Tag > > --------------------------------- > > This topic was proposed by the Manila team from a discussion they had > earlier in the week. We talked about the history of the tag and how usage > of tags has evolved. At this point, the proposal is to remove the tag as > anything in the releases repo is essentially tc-approved. Ghanshyam has > volunteered to document this and do the removal. The board also needs to be > notified of this and to look at projects.yaml in the governance repo as the > source of truth for TC approved projects. The unassigned #ACTION item is to > review remaining tags and see if there are others that need to be > modified/removed/added to drive common behavior across OpenSack > components. > > Board Proposals > > ---------------------- > > This was a pretty quick summary of all discussions we had that had any > impact on the board and largely decided who would mention them. > > > > Session Feedback > > ------------------------ > > This was also a pretty quick topic compared to many of the others, we > talked about how things went across all our discussions (largely we called > the PTG a success) logistically. We tried to make good use of the raising > hands feature which mostly worked, but it lacks context and its possible > that the conversation has moved on by the time it’s your turn (if you even > remember what you want to say). > > OpenStack 2.0: k8s Native > > ----------------------------------- > > This topic was brought up at the end of our time so we didn’t have time to > discuss it really. Basically Mohammed wanted to start the conversation > about adding k8s as a base service[5] and what we would do if a project > proposed required k8s. Adding services that work with k8s could open a door > to new innovation in OpenStack. Obviously this topic will need to be > discussed further as we barely got started before we had to wrap things up. > > > So. > > > The tldr; > > > Here are the #ACTION items we need owners for: > > - > > Start the User Facing API Pop Up Team > - > > Write a resolution about how the deconstructed PTL roles will work > - > > Update Goal Selection docs to explain that one or more goals is fine; > it doesn’t have to be more than one > - > > Two volunteers to start the W goal selection process > - > > Assign two TC liaisons per project > - > > Review Tags to make sure they are still good for driving common > behavior across all openstack projects > > > Here are the things EVERYONE needs to do: > > - > > Review last goal proposal so that we can decide to accept or reject it > for the V release[4] > - > > Add systems that are barriers to progress in openstack to the Reducing > Systems and Friction list > - > > Continue conversations you find important > > > > Thanks everyone for your hard work and great conversations :) > > Enjoy the attached (photoshopped) team photo :) > > -Kendall (diablo_rojo) > > > > [1] TC PTG Etherpad: https://etherpad.opendev.org/p/tc-victoria-ptg > > [2] TC Guide Patch: https://review.opendev.org/#/c/732983/ > > [3] UC TC Merge Thread: > http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014736.html > > > [4] Proposed V Goal: https://review.opendev.org/#/c/731213/ > > [5] Base Service Description: > https://governance.openstack.org/tc/reference/base-services.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue Aug 4 12:22:19 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 4 Aug 2020 07:22:19 -0500 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: On 8/4/20 3:17 AM, Fabian Zimmermann wrote: > Hmm, the err msg tells to run the queens version of the tool. > > Maybe something went wrong, but the db version got incremented? Just > guessing. > > Did you try to find the commit/change that introduced the msg? [snip] > Massimo Sgaravatto > schrieb am Mo., > 3. Aug. 2020, 20:21: > > We have just updated a small OpenStack cluster to > Train. > Everything seems working, but "cinder-status > upgrade check" complains that services and volumes > must have a service UUID [*]. > What does this exactly mean? > > Thanks, Massimo > > [*] > +--------------------------------------------------------------------+ > | Check: Service UUIDs                         | > | Result: Failure                          | > | Details: Services and volumes must have a > service UUID. Please fix | > |   this issue by running Queens online data > migrations.             | > Hmm, this does look concerning. If you are now on Train but a migration is missing from Queens, that would seem to indicate some migrations were missed along the way. Were migrations run in each release prior to getting to Train? -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Tue Aug 4 12:26:06 2020 From: geguileo at redhat.com (Gorka Eguileor) Date: Tue, 4 Aug 2020 14:26:06 +0200 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> Message-ID: <20200804122606.dctnfvxqytfv22ws@localhost> On 03/08, Sean McGinnis wrote: > On 8/3/20 7:55 AM, Lee Yarwood wrote: > > Hello all, > > > > $subject, I've raised the following bug: > > > > openstack-tox-lower-constraints failing due to unmet dependency on decorator==4.0.0 > > https://launchpad.net/bugs/1890123 > > > > I'm trying to resolve this below but I honestly feel like I'm going > > around in circles: > > > > https://review.opendev.org/#/q/topic:bug/1890123 > > > > If anyone has any tooling and/or recommendations for resolving issues > > like this I'd appreciate it! > > > > Cheers, > > This appears to be broken for everyone. I initially saw the decorator > thing with Cinder, but after looking closer realized it's not that package. > > The root issue (or at least one level closer to the root issue, that > seems to be causing the decorator failure) is that the lower-constraints > are not actually being enforced. Even though the logs should it is > passing "-c [path to lower-constraints.txt]". So even though things > should be constrained to a lower version, presumably a version that > works with a different version of decorator, pip is still installing a > newer package than what the constraints should allow. > > There was a pip release on the 28th. Things don't look like they started > failing until the 31st for us though, so either that is not it, or there > was just a delay before our nodes started picking up the newer version. > > I tested locally, and at least with version 19.3.1, I am getting the > correctly constrained packages installed. > > Still looking, but thought I would share in case that info triggers any > ideas for anyone else. > > Sean > > Hi, Looking at one of my patches I see that the right version of dogpile.cache==0.6.5 is being installed [1], but then at another step we download [2] and install [3] version 1.0.1, and we can see that pip is actually complaining that we have incompatibilities [4]. As far as I can see this is because in that pip install we requested to wipe existing installed packages [6] and we are not passing any constraints in that call. I don't know why or where we are doing that though. Cheers, Gorka. [1]: https://zuul.opendev.org/t/openstack/build/49f226f8efb94c088cb2b22c46565d97/log/tox/lower-constraints-1.log#235-236 [2]: https://zuul.opendev.org/t/openstack/build/49f226f8efb94c088cb2b22c46565d97/log/tox/lower-constraints-2.log#148-149 [3]: https://zuul.opendev.org/t/openstack/build/49f226f8efb94c088cb2b22c46565d97/log/tox/lower-constraints-2.log#168-174 [4]: https://zuul.opendev.org/t/openstack/build/49f226f8efb94c088cb2b22c46565d97/log/tox/lower-constraints-2.log#202-203 [5]: https://zuul.opendev.org/t/openstack/build/49f226f8efb94c088cb2b22c46565d97/log/tox/lower-constraints-2.log#3 From fungi at yuggoth.org Tue Aug 4 12:39:22 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Aug 2020 12:39:22 +0000 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <20200804122606.dctnfvxqytfv22ws@localhost> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> Message-ID: <20200804123922.tcxglphtn6x2yona@yuggoth.org> On 2020-08-04 14:26:06 +0200 (+0200), Gorka Eguileor wrote: [...] > Looking at one of my patches I see that the right version of > dogpile.cache==0.6.5 is being installed [1], but then at another step we > download [2] and install [3] version 1.0.1, and we can see that pip is > actually complaining that we have incompatibilities [4]. > > As far as I can see this is because in that pip install we requested to > wipe existing installed packages [6] and we are not passing any > constraints in that call. > > I don't know why or where we are doing that though. [...] Yes, I started digging into this yesterday too. It's affecting all tox jobs, not just lower-constraints jobs (upper-constraints is close enough to unconstrained that this isn't immediately apparent for master branch jobs, but the divergence becomes obvious in stable branch jobs and it's breaking lots of them). It seems this started roughly a week ago. I don't think we're explicitly doing it, this seems to be a behavior baked into tox itself. Most projects are currently applying constraints via the deps parameter in their tox.ini, and tox appears to invoke pip twice: once to install your deps, and then a second time to install the project being tested. The latter phase does not use the deps parameter, and so no constraints get applied. We might be able to work around this by going back to overriding install_command and putting the -c option there instead, but I haven't had an opportunity to test that theory yet. If anyone else has time to pursue this line of investigation, I'd be curious to hear whether it helps. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sean.mcginnis at gmx.com Tue Aug 4 13:01:16 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 4 Aug 2020 08:01:16 -0500 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: References: Message-ID: <199d0036-99c2-0be0-c464-39dcda8368ca@gmx.com> [adding back the ML] On 8/4/20 7:48 AM, Massimo Sgaravatto wrote: > I am afraid I never ran the online data migration > > This cluster ran Ocata > Then we updated to Rocky. We went though Pike and Queens but just to > run the db-syncs > Then we updated from Rocky to train (again, we went though Stein  but > just to run the db-syncs) > > Am I in troubles now ? > > Thanks, Massimo I know some folks were handling parts of this by running each version in a container. That may be an option to quickly go through the DB migrations. Let's see if anyone responds with any tips to make this easy. From massimo.sgaravatto at gmail.com Tue Aug 4 13:08:51 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Tue, 4 Aug 2020 15:08:51 +0200 Subject: [ops] [cinder[ "Services and volumes must have a service UUID." In-Reply-To: <199d0036-99c2-0be0-c464-39dcda8368ca@gmx.com> References: <199d0036-99c2-0be0-c464-39dcda8368ca@gmx.com> Message-ID: Shouldn't the db sync fail if a needed online data migrations was not done ? PS: Updating is becoming a nightmare: some services now require online data migration, while for others only the db syncs should be done. On Tue, Aug 4, 2020 at 3:01 PM Sean McGinnis wrote: > [adding back the ML] > > On 8/4/20 7:48 AM, Massimo Sgaravatto wrote: > > I am afraid I never ran the online data migration > > > > This cluster ran Ocata > > Then we updated to Rocky. We went though Pike and Queens but just to > > run the db-syncs > > Then we updated from Rocky to train (again, we went though Stein but > > just to run the db-syncs) > > > > Am I in troubles now ? > > > > Thanks, Massimo > > I know some folks were handling parts of this by running each version in > a container. That may be an option to quickly go through the DB migrations. > > Let's see if anyone responds with any tips to make this easy. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Aug 4 13:11:03 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Aug 2020 14:11:03 +0100 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <20200804123922.tcxglphtn6x2yona@yuggoth.org> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> <20200804123922.tcxglphtn6x2yona@yuggoth.org> Message-ID: <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> On Tue, 2020-08-04 at 12:39 +0000, Jeremy Stanley wrote: > On 2020-08-04 14:26:06 +0200 (+0200), Gorka Eguileor wrote: > [...] > > Looking at one of my patches I see that the right version of > > dogpile.cache==0.6.5 is being installed [1], but then at another step we > > download [2] and install [3] version 1.0.1, and we can see that pip is > > actually complaining that we have incompatibilities [4]. > > > > As far as I can see this is because in that pip install we requested to > > wipe existing installed packages [6] and we are not passing any > > constraints in that call. > > > > I don't know why or where we are doing that though. > > [...] > > Yes, I started digging into this yesterday too. It's affecting all > tox jobs, not just lower-constraints jobs (upper-constraints is > close enough to unconstrained that this isn't immediately apparent > for master branch jobs, but the divergence becomes obvious in stable > branch jobs and it's breaking lots of them). It seems this started > roughly a week ago. > > I don't think we're explicitly doing it, this seems to be a behavior > baked into tox itself. Most projects are currently applying > constraints via the deps parameter in their tox.ini, and tox appears > to invoke pip twice: once to install your deps, and then a second > time to install the project being tested. The latter phase does not > use the deps parameter, and so no constraints get applied. > > We might be able to work around this by going back to overriding > install_command and putting the -c option there instead, right so stephen asked me to remove that override in one of my recent patches to os-vif that is under view since he made the comment the command we were using was more or less the same as the default we currently set teh -c in deps. so if i understand the workaound correclty we woudl add -c {env:CONSTRAINTS_OPT} to install_command so "install_command = pip install -U {opts} {packages} -c {env:CONSTRAINTS_OPT}" in our case and then for the lower contriats jobs in stead of deps = -c{toxinidir}/lower-constraints.txt -r{toxinidir}/requirements.txt -r{toxinidir}/test-requirements.txt -r{toxinidir}/doc/requirements.txt we would do setenv = CONSTRAINTS_OPT=-c{toxinidir}/lower-constraints.txt deps = -r{toxinidir}/requirements.txt -r{toxinidir}/test-requirements.txt -r{toxinidir}/doc/requirements.txt that way we can keep the same install command for both but use the correct constrint file. > but I > haven't had an opportunity to test that theory yet. If anyone else > has time to pursue this line of investigation, I'd be curious to > hear whether it helps. From fungi at yuggoth.org Tue Aug 4 13:16:43 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Aug 2020 13:16:43 +0000 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> <20200804123922.tcxglphtn6x2yona@yuggoth.org> <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> Message-ID: <20200804131643.dsdpoea4proojeky@yuggoth.org> On 2020-08-04 14:11:03 +0100 (+0100), Sean Mooney wrote: [...] > so if i understand the workaound correclty we woudl add -c > {env:CONSTRAINTS_OPT} to install_command so "install_command = pip > install -U {opts} {packages} -c {env:CONSTRAINTS_OPT}" in our case > and then for the lower contriats jobs in stead of > > deps = > -c{toxinidir}/lower-constraints.txt > -r{toxinidir}/requirements.txt > -r{toxinidir}/test-requirements.txt > -r{toxinidir}/doc/requirements.txt > > we would do > > setenv = > CONSTRAINTS_OPT=-c{toxinidir}/lower-constraints.txt > deps = > -r{toxinidir}/requirements.txt > -r{toxinidir}/test-requirements.txt > -r{toxinidir}/doc/requirements.txt > > that way we can keep the same install command for both but use the > correct constrint file. [...] Yep, Sean McGinnis is trying a variant of that in https://review.opendev.org/744698 now to see if it alters tox's behavior like we expect. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rafaelweingartner at gmail.com Tue Aug 4 13:20:06 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 4 Aug 2020 10:20:06 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: I am not sure how the projects/communities here in OpenStack are maintained and conducted, but I could for sure help. I am a committer and PMC for some Apache projects; therefore, I am a bit familiar with some processes in OpenSource communities. On Tue, Aug 4, 2020 at 5:11 AM Mark Goddard wrote: > On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner > wrote: > > > > We are working on it. So far we have 3 open proposals there, but we do > not have enough karma to move things along. > > Besides these 3 open proposals, we do have more ongoing extensions that > have not yet been proposed to the community. > > It's good to hear you want to help improve cloudkitty, however it > sounds like what is required is help with maintaining the project. Is > that something you could be involved with? > Mark > > > > > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis > wrote: > >> > >> Posting here to raise awareness, and start discussion about next steps. > >> > >> It appears there is no one working on Cloudkitty anymore. No patches > >> have been merged for several months now, including simple bot proposed > >> patches. It would appear no one is maintaining this project anymore. > >> > >> I know there is a need out there for this type of functionality, so > >> maybe this will raise awareness and get some attention to it. But > >> barring that, I am wondering if we should start the process to retire > >> this project. > >> > >> From a Victoria release perspective, it is milestone-2 week, so we > >> should make a decision if any of the Cloudkitty deliverables should be > >> included in this release or not. We can certainly force releases of > >> whatever is the latest, but I think that is a bit risky since these > >> repos have never merged the job template change for victoria and > >> therefore are not even testing with Python 3.8. That is an official > >> runtime for Victoria, so we run the risk of having issues with the code > >> if someone runs under 3.8 but we have not tested to make sure there are > >> no problems doing so. > >> > >> I am hoping this at least starts the discussion. I will not propose any > >> release patches to remove anything until we have had a chance to discuss > >> the situation. > >> > >> Sean > >> > >> > > > > > > -- > > Rafael Weingärtner > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From CAPSEY at augusta.edu Tue Aug 4 13:39:35 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Tue, 4 Aug 2020 13:39:35 +0000 Subject: [nova] Hyper-V hosts In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814461@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814461@gmsxchsvr01.thecreation.com> Message-ID: You currently need a hyper-v host that is running at least Windows Insider Build 19640 in order to use Epyc with nested virtualization [1]. See if the beta compute driver works[2]. The hyper-v driver for ussuri has release notes[3], so it should be OK, although I haven't personally tried it. Chris Apsey [1] https://github.com/MicrosoftDocs/Virtualization-Documentation/issues/1276 [2] https://www.cloudbase.it/downloads/HyperVNovaCompute_Beta.msi [3] https://docs.openstack.org/releasenotes/compute-hyperv/ussuri.html -----Original Message----- From: Eric K. Miller Sent: Tuesday, August 4, 2020 1:03 AM To: openstack-discuss at lists.openstack.org Subject: [EXTERNAL] [nova] Hyper-V hosts CAUTION: EXTERNAL SENDER This email originated from an external source. Please exercise caution before opening attachments, clicking links, replying, or providing information to the sender. If you believe it to be fraudulent, contact the AU Cybersecurity Hotline at 72-CYBER (2-9237 / 706-722-9237) or 72CYBER at augusta.edu Hi, I thought I'd look into support of Hyper-V hosts for Windows Server environments, but it looks like the latest cloudbase Windows Hyper-V OpenStack Installer is for Train, and nothing seems to discuss the use of Hyper-V in Windows Server 2019. Has it been abandoned? Is anyone using Hyper-V with OpenStack successfully? One of the reasons we thought we might support it is to provide nested support for VMs with GPUs and/or vGPUs, and thought this would work better than with KVM, specifically with AMD EPYC systems. It seems that when "options kvm-amd nested=1" is used in a modprobe.d config file, Windows machines lock up when started. I think this has been an issue for a while with AMD processors, but thought it was fixed recently (I don't remember where I saw this, though). Would love to hear about any experiences related to Hyper-V and/or nested hypervisor support on AMD EPYC processors. Thanks! Eric From rosmaita.fossdev at gmail.com Tue Aug 4 13:48:37 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 4 Aug 2020 09:48:37 -0400 Subject: [cinder] victoria virtual mid-cycle next week Message-ID: <64a4d8e5-d271-8b96-eed2-167f8daf900a@gmail.com> The date/time selection poll has closed, and I am happy to announce the unanimous choice. Session Two of the Cinder Victoria virtual mid-cycle will be held: DATE: 12 August 2020 TIME: 1400-1600 UTC LOCATION: https://bluejeans.com/3228528973 The meeting will be recorded. Please add topics to the etherpad: https://etherpad.opendev.org/p/cinder-victoria-mid-cycles cheers, brian From amotoki at gmail.com Tue Aug 4 14:01:44 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 4 Aug 2020 23:01:44 +0900 Subject: [neutron] bug deputy report (Jul 27 - Aug 2) Message-ID: Hi, Sorry for late just before the meeting. This is my bug deputy report last week. General questions ================= * l3-dvr-backlog tag was originally introduced to identify DVR feature gaps. What should we use for OVN L3 feature gaps? * We have no volunteer for FWaaS now. How should we triage fwaas bugs? Needs attentions ================ Both affects neutron behaviors and they are not simple bugs. More opinions would be appreciated. https://bugs.launchpad.net/neutron/+bug/1889631 [OVS][FW] Multicast non-IGMP traffic is allowed by default, not in iptables FW New, Undecided It might be worth RFE. It affects existing deployments. We need more detail discussion on this. https://bugs.launchpad.net/neutron/+bug/1889454 br-int has an unpredictable MTU New, Undecided This is an interesting topic. More opinions would be appreciated. Confirmed ========= [FT][Fullstack] Timeout during OVS bridge creation transaction https://bugs.launchpad.net/neutron/+bug/1889453 Critical, Confirmed In Progress =========== Functional tests neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect failing on Ubuntu 20.04 https://bugs.launchpad.net/neutron/+bug/1889779 High, In Progress CI failture: ImportError: cannot import decorate https://bugs.launchpad.net/neutron/+bug/1890064 High, In Progress, The fix is https://review.opendev.org/#/c/744465/ but blocked by other gate failures Validate subnet when plugging to the router don't works when plugging port https://bugs.launchpad.net/neutron/+bug/1889619 Low, In Progress Won't Fix ========= Functional tests on Ubuntu 20.04 are timed out https://bugs.launchpad.net/neutron/+bug/1889781 High, Won't Fix, it doesn't look like an issue after fixing other issues per slawek ONV feature gaps or cleanups ============================ https://bugs.launchpad.net/neutron/+bug/1889737 [OVN] Stop using neutron.api.rpc.handlers.resources_rpc with OVN as a backend Medium, Confirmed, a kind of cleanup around OVN https://bugs.launchpad.net/neutron/+bug/1889738 [OVN] Stop doing PgDelPortCommand on each router port update Low, Confirmed [OVN] access between Floatings ip and instance with Direct External IP https://bugs.launchpad.net/neutron/+bug/1889388 New, Undecided, OVN feature gap Q: l3-dvr-backlog tag was originally introduced to identify DVR feature gaps. What should we use for OVN L3 feature gaps? FWaaS ====== neutron_tempest_plugin.fwaas.api.test_fwaasv2_extensions failed https://bugs.launchpad.net/neutron/+bug/1889730 New, Undecided, we have no volunteer for FWaaS now. How should we triage fwaas bugs? From mnaser at vexxhost.com Tue Aug 4 14:47:51 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 4 Aug 2020 10:47:51 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Declare supported runtimes for Wallaby release https://review.opendev.org/743847 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - migrate testing to ubuntu focal https://review.opendev.org/740851 ## Project Updates - Add Keystone Kerberos charm to OpenStack charms https://review.opendev.org/743769 - Deprecate os_congress project https://review.opendev.org/742533 - Add Ceph iSCSI charm to OpenStack charms https://review.opendev.org/744480 ## General Changes - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - [manila] assert:supports-accessible-upgrade https://review.opendev.org/740509 - V goals, Zuul v3 migration: update links and grenade https://review.opendev.org/741987 ## Abandoned Changes - DNM: testing gate on ubuntu focal https://review.opendev.org/743249 # Email Threads - Legacy Zuul Jobs Update 1: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016058.html - Community PyCharm Licenses: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016039.html - Release Countdown R-10: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016243.html - CloudKitty Status: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016171.html - Migrate CI Jobs to Ubuntu Focal Update: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016222.html - TC Monthly Meeting Reminder: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016196.html # Other Reminders - Aug 4: CFP for Open Infra Summit Closes Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From mkopec at redhat.com Tue Aug 4 15:10:35 2020 From: mkopec at redhat.com (Martin Kopec) Date: Tue, 4 Aug 2020 17:10:35 +0200 Subject: [openstack][tempest] Deprecation of scenario.img_dir option Message-ID: Hello all, we deprecated *scenario.img_dir* option in Tempest by this patch [1]. *scenario.img_file* should contain the full path of an image from now on. However, to make the transition easier for all the Tempest dependent projects, Tempest currently accepts both behaviors - the old where the path to an image consisted of *scenario.img_dir* + *scenario.img_file* and the new one where *scenario.img_file* contains the full path to an image. It will be accepting both ways for one whole release - 25. I proposed patches to projects I found they use scenario.img_dir option, see this link [2]. If you maintain a project from the list [2], please review. If your project somehow uses scenario.img_dir or img_file option and is not in the list [2], please make appropriate changes. [1] https://review.opendev.org/#/c/710996 [2] https://review.opendev.org/#/q/topic:remove_img_dir+(status:open+OR+status:merged) Regards, -- Martin Kopec Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Tue Aug 4 15:28:35 2020 From: monika.samal at outlook.com (Monika Samal) Date: Tue, 4 Aug 2020 15:28:35 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann Cc: Monika Samal ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Tue Aug 4 17:05:49 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 4 Aug 2020 18:05:49 +0100 Subject: [neutron][OVS firewall] Multicast non-IGMP traffic is allowed by default, not in iptables FW (LP#1889631) Message-ID: Hello all: First of all, the link: https://bugs.launchpad.net/neutron/+bug/1889631 To sum up the bug: in iptables FW, the non-IGMP multicast traffic from 224.0.0.x was blocked; this is not happening in OVS FW. That was discussed today in the Neutron meeting today [1]. We face two possible situations here: - If we block this traffic now, some deployments using the OVS FW will experience an unexpected network blockage. - Deployments migrating from iptables to OVS FW, now won't be able to explicitly allow this traffic (or block it by default). This also breaks the current API, because some rules won't have any effect (those ones allowing this traffic). A possible solution is to add a new knob in the FW configuration; this config option will allow to block or not this traffic by default. Remember that the FW can only create permissive rules, not blocking ones. Any feedback is welcome! Regards. [1] http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-08-04-14.00.log.html#l-136 -------------- next part -------------- An HTML attachment was scrubbed... URL: From elfosardo at gmail.com Tue Aug 4 17:09:33 2020 From: elfosardo at gmail.com (Riccardo Pittau) Date: Tue, 4 Aug 2020 19:09:33 +0200 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <20200804131643.dsdpoea4proojeky@yuggoth.org> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> <20200804123922.tcxglphtn6x2yona@yuggoth.org> <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> <20200804131643.dsdpoea4proojeky@yuggoth.org> Message-ID: Hi all! After a very interesting and enlightening discussion with Sean and Clark on IRC (thanks!), we were able to test and verify that the issue is related to the latest released version of virtualenv, v2.0.29, that embeds pip 2.20, apparently the real offender here. I submitted a bug to virtualenv [1] for that, the fix is included in pip 2.20.1. The bump in virtualenv is already up [2] and merged and a new version has been released, v2.0.30 [3], that should solve this issue. [1] https://github.com/pypa/virtualenv/issues/1914 [2] https://github.com/pypa/virtualenv/pull/1915 [3] https://pypi.org/project/virtualenv/20.0.30/ A si biri, Riccardo On Tue, Aug 4, 2020 at 3:26 PM Jeremy Stanley wrote: > On 2020-08-04 14:11:03 +0100 (+0100), Sean Mooney wrote: > [...] > > so if i understand the workaound correclty we woudl add -c > > {env:CONSTRAINTS_OPT} to install_command so "install_command = pip > > install -U {opts} {packages} -c {env:CONSTRAINTS_OPT}" in our case > > and then for the lower contriats jobs in stead of > > > > deps = > > -c{toxinidir}/lower-constraints.txt > > -r{toxinidir}/requirements.txt > > -r{toxinidir}/test-requirements.txt > > -r{toxinidir}/doc/requirements.txt > > > > we would do > > > > setenv = > > CONSTRAINTS_OPT=-c{toxinidir}/lower-constraints.txt > > deps = > > -r{toxinidir}/requirements.txt > > -r{toxinidir}/test-requirements.txt > > -r{toxinidir}/doc/requirements.txt > > > > that way we can keep the same install command for both but use the > > correct constrint file. > [...] > > Yep, Sean McGinnis is trying a variant of that in > https://review.opendev.org/744698 now to see if it alters tox's > behavior like we expect. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Aug 4 17:32:00 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Aug 2020 17:32:00 +0000 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> <20200804123922.tcxglphtn6x2yona@yuggoth.org> <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> <20200804131643.dsdpoea4proojeky@yuggoth.org> Message-ID: <20200804173200.7fkfmnwr3qofwjsp@yuggoth.org> On 2020-08-04 19:09:33 +0200 (+0200), Riccardo Pittau wrote: > After a very interesting and enlightening discussion with Sean and > Clark on IRC (thanks!), we were able to test and verify that the > issue is related to the latest released version of virtualenv, > v2.0.29, that embeds pip 2.20, apparently the real offender here. [...] That was indeed confusing. Until I skimmed virtualenv's changelog it hadn't dawned on me that all the problem libraries had a "." in their names. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue Aug 4 17:38:48 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Aug 2020 17:38:48 +0000 Subject: [nova] openstack-tox-lower-constraints broken In-Reply-To: <20200804173200.7fkfmnwr3qofwjsp@yuggoth.org> References: <20200803125522.rjso5tafqzt3sjoh@lyarwood.usersys.redhat.com> <20200804122606.dctnfvxqytfv22ws@localhost> <20200804123922.tcxglphtn6x2yona@yuggoth.org> <38f40d528b2689996b9114157e2c578c71e37942.camel@redhat.com> <20200804131643.dsdpoea4proojeky@yuggoth.org> <20200804173200.7fkfmnwr3qofwjsp@yuggoth.org> Message-ID: <20200804173848.foqxbkdy72z36dcp@yuggoth.org> On 2020-08-04 17:32:00 +0000 (+0000), Jeremy Stanley wrote: > On 2020-08-04 19:09:33 +0200 (+0200), Riccardo Pittau wrote: > > After a very interesting and enlightening discussion with Sean and > > Clark on IRC (thanks!), we were able to test and verify that the > > issue is related to the latest released version of virtualenv, > > v2.0.29, that embeds pip 2.20, apparently the real offender here. > [...] > > That was indeed confusing. Until I skimmed virtualenv's changelog it > hadn't dawned on me that all the problem libraries had a "." in > their names. Er, pip's changelog I meant, of course. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cohuck at redhat.com Tue Aug 4 16:35:03 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Tue, 4 Aug 2020 18:35:03 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200729080503.GB28676@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> Message-ID: <20200804183503.39f56516.cohuck@redhat.com> [sorry about not chiming in earlier] On Wed, 29 Jul 2020 16:05:03 +0800 Yan Zhao wrote: > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: (...) > > Based on the feedback we've received, the previously proposed interface > > is not viable. I think there's agreement that the user needs to be > > able to parse and interpret the version information. Using json seems > > viable, but I don't know if it's the best option. Is there any > > precedent of markup strings returned via sysfs we could follow? I don't think encoding complex information in a sysfs file is a viable approach. Quoting Documentation/filesystems/sysfs.rst: "Attributes should be ASCII text files, preferably with only one value per file. It is noted that it may not be efficient to contain only one value per file, so it is socially acceptable to express an array of values of the same type. Mixing types, expressing multiple lines of data, and doing fancy formatting of data is heavily frowned upon." Even though this is an older file, I think these restrictions still apply. > I found some examples of using formatted string under /sys, mostly under > tracing. maybe we can do a similar implementation. > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format Note that this is *not* sysfs (anything under debug/ follows different rules anyway!) > > name: kvm_mmio > ID: 32 > format: > field:unsigned short common_type; offset:0; size:2; signed:0; > field:unsigned char common_flags; offset:2; size:1; signed:0; > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:u32 type; offset:8; size:4; signed:0; > field:u32 len; offset:12; size:4; signed:0; > field:u64 gpa; offset:16; size:8; signed:0; > field:u64 val; offset:24; size:8; signed:0; > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent 'uevent' can probably be considered a special case, I would not really want to copy it. > DRIVER=vfio-pci > PCI_CLASS=30000 > PCI_ID=8086:591D > PCI_SUBSYS_ID=8086:2212 > PCI_SLOT_NAME=0000:00:02.0 > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > (...) > what about a migration_compatible attribute under device node like > below? > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > SELF: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_2 > aggregator=1 > pv_mode="none+ppgtt+context" > interface_version=3 > COMPATIBLE: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > aggregator={val1}/2 > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > interface_version={val3:int:2,3} > COMPATIBLE: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > aggregator={val1}/2 > pv_mode="" #"" meaning empty, could be absent in a compatible device > interface_version=1 I'd consider anything of a comparable complexity to be a big no-no. If anything, this needs to be split into individual files (with many of them being vendor driver specific anyway.) I think we can list compatible versions in a range/list format, though. Something like cat interface_version 2.1.3 cat interface_version_compatible 2.0.2-2.0.4,2.1.0- (indicating that versions 2.0.{2,3,4} and all versions after 2.1.0 are compatible, considering versions <2 and >2 incompatible by default) Possible compatibility between different mdev types feels a bit odd to me, and should not be included by default (only if it makes sense for a particular vendor driver.) From melwittt at gmail.com Tue Aug 4 21:08:46 2020 From: melwittt at gmail.com (melanie witt) Date: Tue, 4 Aug 2020 14:08:46 -0700 Subject: [placement][gate] functional tests failing Message-ID: <6486f281-5124-4566-af62-55c8a71905bf@gmail.com> Hi all, I recently proposed a change to openstack/placement and found that the functional tests are currently failing. It's because of a recent-ish bump to upper-constraints to allow os-traits 2.4.0: https://review.opendev.org/739330 and placement has a func test that asserts the number of standard traits (more traits are available in 2.4.0). I've proposed a fix for the func test here if anyone could please help review: https://review.opendev.org/744790 Cheers, -melanie From kennelson11 at gmail.com Tue Aug 4 21:22:04 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 4 Aug 2020 14:22:04 -0700 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: I think the majority of 'maintenance' activities at the moment for Cloudkitty are the reviewing of open patches in gerrit [1] and triaging bugs that are reported in Launchpad[2] as they come in. When things come up on this mailing list that have the cloudkitty tag in the subject line (like this email), weighing in on them would also be helpful. If you need help getting setup with gerrit, I am happy to assist anyway I can :) -Kendall Nelson (diablo_rojo) [1] https://review.opendev.org/#/q/project:openstack/cloudkitty+OR+project:openstack/python-cloudkittyclient+OR+project:openstack/cloudkitty-dashboard [2] https://launchpad.net/cloudkitty On Tue, Aug 4, 2020 at 6:21 AM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > I am not sure how the projects/communities here in OpenStack are > maintained and conducted, but I could for sure help. > I am a committer and PMC for some Apache projects; therefore, I am a bit > familiar with some processes in OpenSource communities. > > On Tue, Aug 4, 2020 at 5:11 AM Mark Goddard wrote: > >> On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner >> wrote: >> > >> > We are working on it. So far we have 3 open proposals there, but we do >> not have enough karma to move things along. >> > Besides these 3 open proposals, we do have more ongoing extensions that >> have not yet been proposed to the community. >> >> It's good to hear you want to help improve cloudkitty, however it >> sounds like what is required is help with maintaining the project. Is >> that something you could be involved with? >> Mark >> >> > >> > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis >> wrote: >> >> >> >> Posting here to raise awareness, and start discussion about next steps. >> >> >> >> It appears there is no one working on Cloudkitty anymore. No patches >> >> have been merged for several months now, including simple bot proposed >> >> patches. It would appear no one is maintaining this project anymore. >> >> >> >> I know there is a need out there for this type of functionality, so >> >> maybe this will raise awareness and get some attention to it. But >> >> barring that, I am wondering if we should start the process to retire >> >> this project. >> >> >> >> From a Victoria release perspective, it is milestone-2 week, so we >> >> should make a decision if any of the Cloudkitty deliverables should be >> >> included in this release or not. We can certainly force releases of >> >> whatever is the latest, but I think that is a bit risky since these >> >> repos have never merged the job template change for victoria and >> >> therefore are not even testing with Python 3.8. That is an official >> >> runtime for Victoria, so we run the risk of having issues with the code >> >> if someone runs under 3.8 but we have not tested to make sure there are >> >> no problems doing so. >> >> >> >> I am hoping this at least starts the discussion. I will not propose any >> >> release patches to remove anything until we have had a chance to >> discuss >> >> the situation. >> >> >> >> Sean >> >> >> >> >> > >> > >> > -- >> > Rafael Weingärtner >> > > > -- > Rafael Weingärtner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Tue Aug 4 19:41:48 2020 From: thomas.king at gmail.com (Thomas King) Date: Tue, 4 Aug 2020 13:41:48 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Changing the ml2 flat_networks from specific physical networks to a wildcard allowed me to create a segment. I may be unstuck. New config: [ml2_type_flat] flat_networks=* Now to try creating the subnet and try a remote provision. Tom King On Mon, Aug 3, 2020 at 3:58 PM Thomas King wrote: > I've been using named physical networks so long, I completely forgot using > wildcards! > > Is this the answer???? > > https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html#modular-layer-2-ml2-flat-type-configuration-options > > Tom King > > On Tue, Jul 28, 2020 at 3:46 PM Thomas King wrote: > >> Ruslanas has been a tremendous help. To catch up the discussion lists... >> 1. I enabled Neutron segments. >> 2. I renamed the existing segments for each network so they'll make >> sense. >> 3. I attempted to create a segment for a remote subnet (it is using DHCP >> relay) and this was the error that is blocking me. This is where the docs >> do not cover: >> [root at sea-maas-controller ~(keystone_admin)]# openstack network segment >> create --physical-network remote146-30-32 --network-type flat --network >> baremetal seg-remote-146-30-32 >> BadRequestException: 400: Client Error for url: >> http://10.146.30.65:9696/v2.0/segments, Invalid input for operation: >> physical_network 'remote146-30-32' unknown for flat provider network. >> >> I've asked Ruslanas to clarify how their physical networks correspond to >> their remote networks. They have a single provider network and multiple >> segments tied to multiple physical networks. >> >> However, if anyone can shine some light on this, I would greatly >> appreciate it. How should neutron's configurations accommodate remote >> networks<->Neutron segments when I have only one physical network >> attachment for provisioning? >> >> Thanks! >> Tom King >> >> On Wed, Jul 15, 2020 at 3:33 PM Thomas King >> wrote: >> >>> That helps a lot, thank you! >>> >>> "I use only one network..." >>> This bit seems to go completely against the Neutron segments >>> documentation. When you have access, please let me know if Triple-O is >>> using segments or some other method. >>> >>> I greatly appreciate this, this is a tremendous help. >>> >>> Tom King >>> >>> On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis >>> wrote: >>> >>>> Hi Thomas, >>>> >>>> I have a bit complicated setup from tripleo side :) I use only one >>>> network (only ControlPlane). thanks to Harold, he helped to make it work >>>> for me. >>>> >>>> Yes, as written in the tripleo docs for leaf networks, it use the same >>>> neutron network, different subnets. so neutron network is ctlplane (I >>>> think) and have ctlplane-subnet, remote-provision and remote-KI :)) that >>>> generates additional lines in "ip r s" output for routing "foreign" subnets >>>> through correct gw, if you would have isolated networks, by vlans and ports >>>> this would apply for each subnet different gw... I believe you >>>> know/understand that part. >>>> >>>> remote* subnets have dhcp-relay setup by network team... do not ask >>>> details for that. I do not know how to, but can ask :) >>>> >>>> >>>> in undercloud/tripleo i have 2 dhcp servers, one is for introspection, >>>> another for provide/cleanup and deployment process. >>>> >>>> all of those subnets have organization level tagged networks and are >>>> tagged on network devices, but they are untagged on provisioning >>>> interfaces/ports, as in general pxe should be untagged, but some nic's can >>>> do vlan untag on nic/bios level. but who cares!? >>>> >>>> I just did a brief check on your first post, I think I have simmilar >>>> setup to yours :)) I will check in around 12hours :)) more deaply, as will >>>> be at work :))) >>>> >>>> >>>> P.S. sorry for wrong terms, I am bad at naming. >>>> >>>> >>>> On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: >>>> >>>>> Ruslanas, that would be excellent! >>>>> >>>>> I will reply to you directly for details later unless the maillist >>>>> would like the full thread. >>>>> >>>>> Some preliminary questions: >>>>> >>>>> - Do you have a separate physical interface for the segment(s) >>>>> used for your remote subnets? >>>>> The docs state each segment must have a unique physical network >>>>> name, which suggests a separate physical interface for each segment unless >>>>> I'm misunderstanding something. >>>>> - Are your provisioning segments all on the same Neutron network? >>>>> - Are you using tagged switchports or access switchports to your >>>>> Ironic server(s)? >>>>> >>>>> Thanks, >>>>> Tom King >>>>> >>>>> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis >>>>> wrote: >>>>> >>>>>> I have deployed that with tripleO, but now we are recabling and >>>>>> redeploying it. So once I have it running I can share my configs, just name >>>>>> which you want :) >>>>>> >>>>>> On Tue, 14 Jul 2020 at 18:40, Thomas King >>>>>> wrote: >>>>>> >>>>>>> I have. That's the Triple-O docs and they don't go through the >>>>>>> normal .conf files to explain how it works outside of Triple-O. It has some >>>>>>> ideas but no running configurations. >>>>>>> >>>>>>> Tom King >>>>>>> >>>>>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis < >>>>>>> ruslanas at lpic.lt> wrote: >>>>>>> >>>>>>>> hi, have you checked: >>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>>>>> ? >>>>>>>> I am following this link. I only have one network, having different >>>>>>>> issues tho ;) >>>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Tue Aug 4 23:21:35 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Tue, 4 Aug 2020 18:21:35 -0500 Subject: [nova] Hyper-V hosts In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814461@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481446B@gmsxchsvr01.thecreation.com> > You currently need a hyper-v host that is running at least Windows Insider > Build 19640 in order to use Epyc with nested virtualization [1]. See if the beta > compute driver works[2]. The hyper-v driver for ussuri has release notes[3], > so it should be OK, although I haven't personally tried it. > > Chris Apsey Thank you Chris! I must have seen the Windows Insider Build notes somewhere about this. Thanks for the link! Glad to see that development continues on the cloudbase components. We'll take a test run with this in the near future when we have some hardware dedicated to this. Eric From thomas.king at gmail.com Tue Aug 4 22:22:11 2020 From: thomas.king at gmail.com (Thomas King) Date: Tue, 4 Aug 2020 16:22:11 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Getting closer. I was able to create the segment and the subnet for the remote network on that segment. When I attempted to provide the baremetal node, Neutron is unable to create/attach a port to the remote node: WARNING ironic.common.neutron [req-b3f373fc-e76a-4c13-9ebb-41cfc682d31b 4946f15716c04f8585d013e364802c6c 1664a38fc668432ca6bee9189be142d9 - default default] The local_link_connection is required for 'neutron' network interface and is not present in the nodes 3ed87e51-00c5-4b27-95c0-665c8337e49b port ccc335c6-3521-48a5-927d-d7ee13f7f05b I changed its network interface from neutron back to flat and it went past this. I'm now waiting to see if the node will PXE boot. On Tue, Aug 4, 2020 at 1:41 PM Thomas King wrote: > Changing the ml2 flat_networks from specific physical networks to a > wildcard allowed me to create a segment. I may be unstuck. > > New config: > [ml2_type_flat] > flat_networks=* > > Now to try creating the subnet and try a remote provision. > > Tom King > > On Mon, Aug 3, 2020 at 3:58 PM Thomas King wrote: > >> I've been using named physical networks so long, I completely forgot >> using wildcards! >> >> Is this the answer???? >> >> https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html#modular-layer-2-ml2-flat-type-configuration-options >> >> Tom King >> >> On Tue, Jul 28, 2020 at 3:46 PM Thomas King >> wrote: >> >>> Ruslanas has been a tremendous help. To catch up the discussion lists... >>> 1. I enabled Neutron segments. >>> 2. I renamed the existing segments for each network so they'll make >>> sense. >>> 3. I attempted to create a segment for a remote subnet (it is using DHCP >>> relay) and this was the error that is blocking me. This is where the docs >>> do not cover: >>> [root at sea-maas-controller ~(keystone_admin)]# openstack network segment >>> create --physical-network remote146-30-32 --network-type flat --network >>> baremetal seg-remote-146-30-32 >>> BadRequestException: 400: Client Error for url: >>> http://10.146.30.65:9696/v2.0/segments, Invalid input for operation: >>> physical_network 'remote146-30-32' unknown for flat provider network. >>> >>> I've asked Ruslanas to clarify how their physical networks correspond to >>> their remote networks. They have a single provider network and multiple >>> segments tied to multiple physical networks. >>> >>> However, if anyone can shine some light on this, I would greatly >>> appreciate it. How should neutron's configurations accommodate remote >>> networks<->Neutron segments when I have only one physical network >>> attachment for provisioning? >>> >>> Thanks! >>> Tom King >>> >>> On Wed, Jul 15, 2020 at 3:33 PM Thomas King >>> wrote: >>> >>>> That helps a lot, thank you! >>>> >>>> "I use only one network..." >>>> This bit seems to go completely against the Neutron segments >>>> documentation. When you have access, please let me know if Triple-O is >>>> using segments or some other method. >>>> >>>> I greatly appreciate this, this is a tremendous help. >>>> >>>> Tom King >>>> >>>> On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> Hi Thomas, >>>>> >>>>> I have a bit complicated setup from tripleo side :) I use only one >>>>> network (only ControlPlane). thanks to Harold, he helped to make it work >>>>> for me. >>>>> >>>>> Yes, as written in the tripleo docs for leaf networks, it use the same >>>>> neutron network, different subnets. so neutron network is ctlplane (I >>>>> think) and have ctlplane-subnet, remote-provision and remote-KI :)) that >>>>> generates additional lines in "ip r s" output for routing "foreign" subnets >>>>> through correct gw, if you would have isolated networks, by vlans and ports >>>>> this would apply for each subnet different gw... I believe you >>>>> know/understand that part. >>>>> >>>>> remote* subnets have dhcp-relay setup by network team... do not ask >>>>> details for that. I do not know how to, but can ask :) >>>>> >>>>> >>>>> in undercloud/tripleo i have 2 dhcp servers, one is for introspection, >>>>> another for provide/cleanup and deployment process. >>>>> >>>>> all of those subnets have organization level tagged networks and are >>>>> tagged on network devices, but they are untagged on provisioning >>>>> interfaces/ports, as in general pxe should be untagged, but some nic's can >>>>> do vlan untag on nic/bios level. but who cares!? >>>>> >>>>> I just did a brief check on your first post, I think I have simmilar >>>>> setup to yours :)) I will check in around 12hours :)) more deaply, as will >>>>> be at work :))) >>>>> >>>>> >>>>> P.S. sorry for wrong terms, I am bad at naming. >>>>> >>>>> >>>>> On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: >>>>> >>>>>> Ruslanas, that would be excellent! >>>>>> >>>>>> I will reply to you directly for details later unless the maillist >>>>>> would like the full thread. >>>>>> >>>>>> Some preliminary questions: >>>>>> >>>>>> - Do you have a separate physical interface for the segment(s) >>>>>> used for your remote subnets? >>>>>> The docs state each segment must have a unique physical network >>>>>> name, which suggests a separate physical interface for each segment unless >>>>>> I'm misunderstanding something. >>>>>> - Are your provisioning segments all on the same Neutron network? >>>>>> - Are you using tagged switchports or access switchports to your >>>>>> Ironic server(s)? >>>>>> >>>>>> Thanks, >>>>>> Tom King >>>>>> >>>>>> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis < >>>>>> ruslanas at lpic.lt> wrote: >>>>>> >>>>>>> I have deployed that with tripleO, but now we are recabling and >>>>>>> redeploying it. So once I have it running I can share my configs, just name >>>>>>> which you want :) >>>>>>> >>>>>>> On Tue, 14 Jul 2020 at 18:40, Thomas King >>>>>>> wrote: >>>>>>> >>>>>>>> I have. That's the Triple-O docs and they don't go through the >>>>>>>> normal .conf files to explain how it works outside of Triple-O. It has some >>>>>>>> ideas but no running configurations. >>>>>>>> >>>>>>>> Tom King >>>>>>>> >>>>>>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis < >>>>>>>> ruslanas at lpic.lt> wrote: >>>>>>>> >>>>>>>>> hi, have you checked: >>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>>>>>> ? >>>>>>>>> I am following this link. I only have one network, having >>>>>>>>> different issues tho ;) >>>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Aug 5 01:50:40 2020 From: melwittt at gmail.com (melanie witt) Date: Tue, 4 Aug 2020 18:50:40 -0700 Subject: [placement][gate] functional tests failing In-Reply-To: <6486f281-5124-4566-af62-55c8a71905bf@gmail.com> References: <6486f281-5124-4566-af62-55c8a71905bf@gmail.com> Message-ID: On 8/4/20 14:08, melanie witt wrote: > Hi all, > > I recently proposed a change to openstack/placement and found that the > functional tests are currently failing. It's because of a recent-ish > bump to upper-constraints to allow os-traits 2.4.0: > > https://review.opendev.org/739330 > > and placement has a func test that asserts the number of standard traits > (more traits are available in 2.4.0). > > I've proposed a fix for the func test here if anyone could please help > review: > > https://review.opendev.org/744790 The fix has merged and the placement gate is all clear! -melanie From jasonanderson at uchicago.edu Wed Aug 5 03:49:55 2020 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 5 Aug 2020 03:49:55 +0000 Subject: [swift][ceph] Container ACLs don't seem to be respected on Ceph RGW Message-ID: <757BCAB6-CA22-439E-9C0C-BE4DEC7B7927@uchicago.edu> Hi all, Just scratching my head at this for a while and though I’d ask here in case it saves some time. I’m running a Ceph cluster on the Nautilus release and it’s running Swift via the rgw. I have Keystone authentication turned on. Everything works fine in the normal case of creating containers, uploading files, listing containers, etc. However, I notice that ACLs don’t seem to work. I am not overriding "rgw enforce swift acls”, so it is set to the default of true. I can’t seem to share a container or make it public. (Side note, confusingly, the Ceph implementation has a different syntax for public read/write containers, ‘*’ as opposed to ‘*:*’ for public write for example.) Here’s what I’m doing (as admin) swift post —write-acl ‘*’ —read-acl ‘*’ public-container swift stat public-container Account: v1 Container: public-container Objects: 1 Bytes: 5801 Read ACL: * Write ACL: * Sync To: Sync Key: X-Timestamp: 1595883106.23179 X-Container-Bytes-Used-Actual: 8192 X-Storage-Policy: default-placement X-Storage-Class: STANDARD Last-Modified: Wed, 05 Aug 2020 03:42:11 GMT X-Trans-Id: tx000000000000000662156-005f2a2bea-23478-default X-Openstack-Request-Id: tx000000000000000662156-005f2a2bea-23478-default Accept-Ranges: bytes Content-Type: text/plain; charset=utf-8 (as non-admin) swift upload public-container test.txt Warning: failed to create container 'public-container': 409 Conflict: BucketAlreadyExists Object HEAD failed: https://ceph.example.org:7480/swift/v1/public-container/README.md 403 Forbidden swift list public-container Container GET failed: https://ceph.example.org:7480/swift/v1/public-container?format=json 403 Forbidden [first 60 chars of response] b'{"Code":"AccessDenied","BucketName”:”public-container","RequestId":"tx0' Failed Transaction ID: tx000000000000000662162-005f2a2c2a-23478-default What am I missing? Thanks in advance! /Jason From mark at stackhpc.com Wed Aug 5 07:53:20 2020 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Aug 2020 08:53:20 +0100 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: On Tue, 4 Aug 2020 at 16:58, Monika Samal wrote: > Hello Guys, > > With Michaels help I was able to solve the problem but now there is > another error I was able to create my network on vlan but still error > persist. PFB the logs: > > http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ > > Kindly help > > regards, > Monika > ------------------------------ > *From:* Michael Johnson > *Sent:* Monday, August 3, 2020 9:10 PM > *To:* Fabian Zimmermann > *Cc:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Yeah, it looks like nova is failing to boot the instance. > > Check this setting in your octavia.conf files: > https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id > > Also, if kolla-ansible didn't set both of these values correctly, please > open bug reports for kolla-ansible. These all should have been configured > by the deployment tool. > > I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael > > On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: > > Seems like the flavor is missing or empty '' - check for typos and enable > debug. > > Check if the nova req contains valid information/flavor. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 15:46: > > It's registered > > Get Outlook for Android > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Monday, August 3, 2020 7:08:21 PM > *To:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Did you check the (nova) flavor you use in octavia. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 10:53: > > After Michael suggestion I was able to create load balancer but there is > error in status. > > > > PFB the error link: > > http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ > ------------------------------ > *From:* Monika Samal > *Sent:* Monday, August 3, 2020 2:08 PM > *To:* Michael Johnson > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Thanks a ton Michael for helping me out > ------------------------------ > *From:* Michael Johnson > *Sent:* Friday, July 31, 2020 3:57 AM > *To:* Monika Samal > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Just to close the loop on this, the octavia.conf file had > "project_name = admin" instead of "project_name = service" in the > [service_auth] section. This was causing the keystone errors when > Octavia was communicating with neutron. > > I don't know if that is a bug in kolla-ansible or was just a local > configuration issue. > > Michael > > On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > > > Hello Fabian,, > > > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > > > Regards, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > Hi, > > > > just to debug, could you replace the auth_type password with v3password? > > > > And do a curl against your :5000 and :35357 urls and paste the output. > > > > Fabian > > > > Monika Samal schrieb am Do., 30. Juli 2020, > 22:15: > > > > Hello Fabian, > > > > http://paste.openstack.org/show/796477/ > > > > Thanks, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > The sections should be > > > > service_auth > > keystone_authtoken > > > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > > > Fabian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Wed Aug 5 09:41:10 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 04:41:10 -0500 Subject: [cinder][nova] Local storage in compute node Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> Hi, I'm research methods to get around high storage latency for some applications where redundancy does not matter, so using local NVMe drives in compute nodes seems to be the practical choice. However, there does not appear to be a good solution from what I have read. For example, BlockDeviceDriver has been deprecated/removed, LVM is only supported via iSCSI (which is slow) and localization of LVM volumes onto the same compute node as VMs is impossible, and other methods (PCI pass-through, etc.) would require direct access to the local drives, where device cleansing would need to occur after a device was removed from a VM, and I don't believe there is a hook for this. Ephemeral storage appears to be an option, but I believe it has the same issue as PCI pass-through, in that there is no abiilty to automatically cleanse a device after it has been used. In our default configuration, ephemeral storage is redirected to use Ceph, which solves the cleansing issue, but isn't suitable due to its high latency. Also, ephemeral storage appears as a second device, not the root disk, so that complicates a few configurations we have. Is there any other way to write an operating system image onto a local drive and boot from it? Or preferably assign an LVM /dev/mapper path as a device in libvirt (no iSCSI) after configuring a logical volume? or am I missing something? Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Wed Aug 5 10:03:29 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 05:03:29 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> In case this is the answer, I found that in nova.conf, under the [libvirt] stanza, images_type can be set to "lvm". This looks like it may do the trick - using the compute node's LVM to provision and mount a logical volume, for either persistent or ephemeral storage defined in the flavor. Can anyone validate that this is the right approach according to our needs? Also, I have read about the LVM device filters - which is important to avoid the host's LVM from seeing the guest's volumes, in case anyone else finds this message. Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Wed Aug 5 11:19:34 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Wed, 5 Aug 2020 12:19:34 +0100 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> Message-ID: <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> On 05-08-20 05:03:29, Eric K. Miller wrote: > In case this is the answer, I found that in nova.conf, under the > [libvirt] stanza, images_type can be set to "lvm". This looks like it > may do the trick - using the compute node's LVM to provision and mount a > logical volume, for either persistent or ephemeral storage defined in > the flavor. > > Can anyone validate that this is the right approach according to our > needs? I'm not sure if it is given your initial requirements. Do you need full host block devices to be provided to the instance? The LVM imagebackend will just provision LVs on top of the provided VG so there's no direct mapping to a full host block device with this approach. That said there's no real alternative available at the moment. > Also, I have read about the LVM device filters - which is important to > avoid the host's LVM from seeing the guest's volumes, in case anyone > else finds this message. Yeah that's a common pitfall when using LVM based ephemeral disks that contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain hidden from the host: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From smooney at redhat.com Wed Aug 5 11:40:34 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Aug 2020 12:40:34 +0100 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> Message-ID: <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: > On 05-08-20 05:03:29, Eric K. Miller wrote: > > In case this is the answer, I found that in nova.conf, under the > > [libvirt] stanza, images_type can be set to "lvm". This looks like it > > may do the trick - using the compute node's LVM to provision and mount a > > logical volume, for either persistent or ephemeral storage defined in > > the flavor. > > > > Can anyone validate that this is the right approach according to our > > needs? > > I'm not sure if it is given your initial requirements. > > Do you need full host block devices to be provided to the instance? > > The LVM imagebackend will just provision LVs on top of the provided VG > so there's no direct mapping to a full host block device with this > approach. > > That said there's no real alternative available at the moment. well one alternitive to nova providing local lvm storage is to use the cinder lvm driver but install it on all compute nodes then use the cidner InstanceLocalityFilter to ensure the volume is alocated form the host the vm is on. https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume seperatly afterwards. > > > Also, I have read about the LVM device filters - which is important to > > avoid the host's LVM from seeing the guest's volumes, in case anyone > > else finds this message. > > > Yeah that's a common pitfall when using LVM based ephemeral disks that > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host > is configured to not scan these LVs in order for their PVs/VGs/LVs etc > to remain hidden from the host: > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters  > > From smooney at redhat.com Wed Aug 5 11:45:47 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Aug 2020 12:45:47 +0100 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> Message-ID: <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote: > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: > > On 05-08-20 05:03:29, Eric K. Miller wrote: > > > In case this is the answer, I found that in nova.conf, under the > > > [libvirt] stanza, images_type can be set to "lvm". This looks like it > > > may do the trick - using the compute node's LVM to provision and mount a > > > logical volume, for either persistent or ephemeral storage defined in > > > the flavor. > > > > > > Can anyone validate that this is the right approach according to our > > > needs? > > > > I'm not sure if it is given your initial requirements. > > > > Do you need full host block devices to be provided to the instance? > > > > The LVM imagebackend will just provision LVs on top of the provided VG > > so there's no direct mapping to a full host block device with this > > approach. > > > > That said there's no real alternative available at the moment. > > well one alternitive to nova providing local lvm storage is to use > the cinder lvm driver but install it on all compute nodes then > use the cidner InstanceLocalityFilter to ensure the volume is alocated form the host > the vm is on. > https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter > on drawback to this is that if the if the vm is moved i think you would need to also migrate the cinder volume > seperatly afterwards. by the way if you were to take this approch i think there is an nvmeof driver so you can use nvme over rdma instead of iscsi. > > > > > > Also, I have read about the LVM device filters - which is important to > > > avoid the host's LVM from seeing the guest's volumes, in case anyone > > > else finds this message. > > > > > > Yeah that's a common pitfall when using LVM based ephemeral disks that > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host > > is configured to not scan these LVs in order for their PVs/VGs/LVs etc > > to remain hidden from the host: > > > > > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters > > > > > > > From donny at fortnebula.com Wed Aug 5 12:36:18 2020 From: donny at fortnebula.com (Donny Davis) Date: Wed, 5 Aug 2020 08:36:18 -0400 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> Message-ID: I use local nvme to drive the CI workload for the openstack community for the last year or so. It seems to work pretty well. I just created a filesystem (xfs) and mounted it to /var/lib/nova/instances I moved glance to using my swift backend and it really made the download of the images much faster. It depends on if the workload is going to handle HA or you are expecting to migrate machines. If the workload is ephemeral or HA can be handled in the app I think local storage is still a very viable option. Simpler is better IMO On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney wrote: > On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote: > > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: > > > On 05-08-20 05:03:29, Eric K. Miller wrote: > > > > In case this is the answer, I found that in nova.conf, under the > > > > [libvirt] stanza, images_type can be set to "lvm". This looks like > it > > > > may do the trick - using the compute node's LVM to provision and > mount a > > > > logical volume, for either persistent or ephemeral storage defined in > > > > the flavor. > > > > > > > > Can anyone validate that this is the right approach according to our > > > > needs? > > > > > > I'm not sure if it is given your initial requirements. > > > > > > Do you need full host block devices to be provided to the instance? > > > > > > The LVM imagebackend will just provision LVs on top of the provided VG > > > so there's no direct mapping to a full host block device with this > > > approach. > > > > > > That said there's no real alternative available at the moment. > > > > well one alternitive to nova providing local lvm storage is to use > > the cinder lvm driver but install it on all compute nodes then > > use the cidner InstanceLocalityFilter to ensure the volume is alocated > form the host > > the vm is on. > > > https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter > > on drawback to this is that if the if the vm is moved i think you would > need to also migrate the cinder volume > > seperatly afterwards. > by the way if you were to take this approch i think there is an nvmeof > driver so you can use nvme over rdma > instead of iscsi. > > > > > > > > > Also, I have read about the LVM device filters - which is important > to > > > > avoid the host's LVM from seeing the guest's volumes, in case anyone > > > > else finds this message. > > > > > > > > > Yeah that's a common pitfall when using LVM based ephemeral disks that > > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the > host > > > is configured to not scan these LVs in order for their PVs/VGs/LVs etc > > > to remain hidden from the host: > > > > > > > > > > > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters > > > > > > > > > > > > > > > -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 5 13:01:12 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Aug 2020 14:01:12 +0100 Subject: [cinder][nova] Local storage in compute node In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> Message-ID: <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote: > I use local nvme to drive the CI workload for the openstack community for > the last year or so. It seems to work pretty well. I just created a > filesystem (xfs) and mounted it to /var/lib/nova/instances > I moved glance to using my swift backend and it really made the download of > the images much faster. > > It depends on if the workload is going to handle HA or you are expecting to > migrate machines. If the workload is ephemeral or HA can be handled in the > app I think local storage is still a very viable option. > > Simpler is better IMO yes that works well with the default flat/qcow file format i assume there was a reason this was not the starting point. the nova lvm backend i think does not supprot thin provisioning so fi you did the same thing creating the volume group on the nvme deivce you would technically get better write performance after the vm is booted but the vm spwan is slower since we cant take advantage of thin providioning and each root disk need to be copided form the cahced image. so just monting the nova data directory on an nvme driver or a raid of nvme drives works well and is simple to do. i take a slightly more complex approach from my home cluster wehre i put the nova data directory on a bcache block device which puts an nvme pci ssd as a cache infront of my raid 10 fo HDDs to acclerate it. from nova point of view there is nothing special about this setup it just works. the draw back to this is you cant change teh stroage avaiable to a vm without creating a new flaovr. exposing the nvme deivce or subsection of them via cinder has the advantage of allowing you to use teh vloume api to tailor the amount of storage per vm rather then creating a bunch of different flavors but with the over head fo needing to connect to the storage over a network protocol. so there are trade off with both appoches. generally i recommend using local sotrage e.g. the vm root disk or ephemeral disk for fast scratchpad space to work on data bug persitie all relevent data permently via cinder volumes. that requires you to understand which block devices a local and which are remote but it give you the best of both worlds. > > > > On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney wrote: > > > On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote: > > > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: > > > > On 05-08-20 05:03:29, Eric K. Miller wrote: > > > > > In case this is the answer, I found that in nova.conf, under the > > > > > [libvirt] stanza, images_type can be set to "lvm". This looks like > > > > it > > > > > may do the trick - using the compute node's LVM to provision and > > > > mount a > > > > > logical volume, for either persistent or ephemeral storage defined in > > > > > the flavor. > > > > > > > > > > Can anyone validate that this is the right approach according to our > > > > > needs? > > > > > > > > I'm not sure if it is given your initial requirements. > > > > > > > > Do you need full host block devices to be provided to the instance? > > > > > > > > The LVM imagebackend will just provision LVs on top of the provided VG > > > > so there's no direct mapping to a full host block device with this > > > > approach. > > > > > > > > That said there's no real alternative available at the moment. > > > > > > well one alternitive to nova providing local lvm storage is to use > > > the cinder lvm driver but install it on all compute nodes then > > > use the cidner InstanceLocalityFilter to ensure the volume is alocated > > > > form the host > > > the vm is on. > > > > > > > https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter > > > on drawback to this is that if the if the vm is moved i think you would > > > > need to also migrate the cinder volume > > > seperatly afterwards. > > > > by the way if you were to take this approch i think there is an nvmeof > > driver so you can use nvme over rdma > > instead of iscsi. > > > > > > > > > > > > Also, I have read about the LVM device filters - which is important > > > > to > > > > > avoid the host's LVM from seeing the guest's volumes, in case anyone > > > > > else finds this message. > > > > > > > > > > > > Yeah that's a common pitfall when using LVM based ephemeral disks that > > > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the > > > > host > > > > is configured to not scan these LVs in order for their PVs/VGs/LVs etc > > > > to remain hidden from the host: > > > > > > > > > > > > > > > > > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters > > > > > > > > > > > > > > > > > > > > > > > > > From donny at fortnebula.com Wed Aug 5 13:22:58 2020 From: donny at fortnebula.com (Donny Davis) Date: Wed, 5 Aug 2020 09:22:58 -0400 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> Message-ID: On Wed, Aug 5, 2020 at 9:01 AM Sean Mooney wrote: > On Wed, 2020-08-05 at 08:36 -0400, Donny Davis wrote: > > I use local nvme to drive the CI workload for the openstack community for > > the last year or so. It seems to work pretty well. I just created a > > filesystem (xfs) and mounted it to /var/lib/nova/instances > > I moved glance to using my swift backend and it really made the download > of > > the images much faster. > > > > It depends on if the workload is going to handle HA or you are expecting > to > > migrate machines. If the workload is ephemeral or HA can be handled in > the > > app I think local storage is still a very viable option. > > > > Simpler is better IMO > yes that works well with the default flat/qcow file format > i assume there was a reason this was not the starting point. > the nova lvm backend i think does not supprot thin provisioning > so fi you did the same thing creating the volume group on the nvme deivce > you would technically get better write performance after the vm is booted > but > the vm spwan is slower since we cant take advantage of thin providioning > and > each root disk need to be copided form the cahced image. > > so just monting the nova data directory on an nvme driver or a raid of > nvme drives > works well and is simple to do. > > i take a slightly more complex approach from my home cluster wehre i put > the > nova data directory on a bcache block device which puts an nvme pci ssd as > a cache > infront of my raid 10 fo HDDs to acclerate it. from nova point of view > there is nothing special > about this setup it just works. > > the draw back to this is you cant change teh stroage avaiable to a vm > without creating a new flaovr. > exposing the nvme deivce or subsection of them via cinder has the > advantage of allowing you to use > teh vloume api to tailor the amount of storage per vm rather then creating > a bunch of different flavors > but with the over head fo needing to connect to the storage over a network > protocol. > > so there are trade off with both appoches. > generally i recommend using local sotrage e.g. the vm root disk or > ephemeral disk for fast scratchpad space > to work on data bug persitie all relevent data permently via cinder > volumes. that requires you to understand which block > devices a local and which are remote but it give you the best of both > worlds. > > > > > > > > > On Wed, Aug 5, 2020 at 7:48 AM Sean Mooney wrote: > > > > > On Wed, 2020-08-05 at 12:40 +0100, Sean Mooney wrote: > > > > On Wed, 2020-08-05 at 12:19 +0100, Lee Yarwood wrote: > > > > > On 05-08-20 05:03:29, Eric K. Miller wrote: > > > > > > In case this is the answer, I found that in nova.conf, under the > > > > > > [libvirt] stanza, images_type can be set to "lvm". This looks > like > > > > > > it > > > > > > may do the trick - using the compute node's LVM to provision and > > > > > > mount a > > > > > > logical volume, for either persistent or ephemeral storage > defined in > > > > > > the flavor. > > > > > > > > > > > > Can anyone validate that this is the right approach according to > our > > > > > > needs? > > > > > > > > > > I'm not sure if it is given your initial requirements. > > > > > > > > > > Do you need full host block devices to be provided to the instance? > > > > > > > > > > The LVM imagebackend will just provision LVs on top of the > provided VG > > > > > so there's no direct mapping to a full host block device with this > > > > > approach. > > > > > > > > > > That said there's no real alternative available at the moment. > > > > > > > > well one alternitive to nova providing local lvm storage is to use > > > > the cinder lvm driver but install it on all compute nodes then > > > > use the cidner InstanceLocalityFilter to ensure the volume is > alocated > > > > > > form the host > > > > the vm is on. > > > > > > > > > > > https://docs.openstack.org/cinder/latest/configuration/block-storage/scheduler-filters.html#instancelocalityfilter > > > > on drawback to this is that if the if the vm is moved i think you > would > > > > > > need to also migrate the cinder volume > > > > seperatly afterwards. > > > > > > by the way if you were to take this approch i think there is an nvmeof > > > driver so you can use nvme over rdma > > > instead of iscsi. > > > > > > > > > > > > > > > Also, I have read about the LVM device filters - which is > important > > > > > > to > > > > > > avoid the host's LVM from seeing the guest's volumes, in case > anyone > > > > > > else finds this message. > > > > > > > > > > > > > > > Yeah that's a common pitfall when using LVM based ephemeral disks > that > > > > > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the > > > > > > host > > > > > is configured to not scan these LVs in order for their PVs/VGs/LVs > etc > > > > > to remain hidden from the host: > > > > > > > > > > > > > > > > > > > > > > > > > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_filters > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have been through just about every possible nvme backend option for nova and the one that has turned up to be the most reliable and predictable has been simple defaults so far. Right now I am giving an nvme + nfs backend a spin. It doesn't perform badly, but it is not a local nvme. One of the things I have found with nvme is the mdadm raid driver is just not fast enough to keep up if you use anything other than raid0/1 (10) - I have a raid5 array I have got working pretty good - but its still limited. I don't have any vroc capable equipment, so maybe that will make a difference if implemented. I also have an all nvme ceph cluster I plan to test using cephfs (i know rbd is an option, but where is the fun in that). From my experience over the last two years in working with nvme only things, it seems that nothing comes close to matching the performance of what a couple local nvme drives in raid0 can do. NVME is so fast that the rest of my (old) equipment just can't keep up, it really does push things to the limits of what is possible. The all nvme ceph cluster does push my 40G network to its limits, but I had to create multiple OSD's per nvme to get there - for my gear (intel DC p3600's) I ended up at 3 OSD's per nvme. It seems to me to be limited by network performance. If you have any other questions I am happy to help where I can - I have been working with all nvme stuff for the last couple years and have gotten something into prod for about 1 year with it (maybe a little longer). >From what I can tell, getting max performance from nvme for an instance is a non-trivial task because it's just so much faster than the rest of the stack and careful considerations must be taken to get the most out of it. I am curious to see where you take this Eric -- ~/DonnyD C: 805 814 6800 "No mission too difficult. No sacrifice too great. Duty First" -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at gmx.net Wed Aug 5 14:33:22 2020 From: openstack at gmx.net (Marc Vorwerk) Date: Wed, 05 Aug 2020 16:33:22 +0200 Subject: [nova] Change Volume Type Properties Message-ID: <24E5E9E3-6BF6-492C-BBBB-670DC070CF15@gmx.net> Hi, I'm looking for a way to add the property volume_backend_name to an existing Volume Type which is in use. If I try to change this, I got the following error: root at control01rt:~# openstack volume type show test-type +--------------------+--------------------------------------+ | Field              | Value                                | +--------------------+--------------------------------------+ | access_project_ids | None                                 | | description        | None                                 | | id                 | 68febdad-e7b1-4d41-ba11-72d0e1a1cce0 | | is_public          | True                                 | | name               | test-type                            | | properties         |                                      | | qos_specs_id       | None                                 | +--------------------+--------------------------------------+ root at control01rt:~# openstack volume type set --property volume_backend_name=ceph test-type Failed to set volume type property: Volume Type is currently in use. (HTTP 400) (Request-ID: req-2b8f3829-5c16-42c3-ac57-01199688bd58) Command Failed: One or more of the operations failed root at control01rt:~# Problem what I see is, that there are instances/volumes which use this volume type. Have anybody an idea, how I can add the volume_backend_name property to the existing Volume Type? thanks in advance! Regards Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Aug 5 15:14:49 2020 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Aug 2020 16:14:49 +0100 Subject: =?UTF-8?Q?Re=3A_=5Bkolla=5D_Proposing_Micha=C5=82_Nasiadka_for_kayobe=2Dco?= =?UTF-8?Q?re?= In-Reply-To: References: Message-ID: On Tue, 28 Jul 2020 at 16:08, Doug Szumski wrote: > > > On 28/07/2020 15:50, Mark Goddard wrote: > > Hi, > > > > I'd like to propose adding Michał Nasiadka to the kayobe-core group. > > Michał is a valued member of the Kolla core team, and has been > > providing some good patches and reviews for Kayobe too. > > > > Kayobians, please respond with +1/-1. It's been a week, with only approvals - welcome to the core team Michał! > Sounds excellent, +1 for Michał! > > > > Cheers, > > Mark > > From johnsomor at gmail.com Wed Aug 5 15:16:23 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 5 Aug 2020 08:16:23 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: > > > On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: > >> Hello Guys, >> >> With Michaels help I was able to solve the problem but now there is >> another error I was able to create my network on vlan but still error >> persist. PFB the logs: >> >> http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ >> >> Kindly help >> >> regards, >> Monika >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Monday, August 3, 2020 9:10 PM >> *To:* Fabian Zimmermann >> *Cc:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Yeah, it looks like nova is failing to boot the instance. >> >> Check this setting in your octavia.conf files: >> https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id >> >> Also, if kolla-ansible didn't set both of these values correctly, please >> open bug reports for kolla-ansible. These all should have been configured >> by the deployment tool. >> >> > I wasn't following this thread due to no [kolla] tag, but here are the > recently added docs for Octavia in kolla [1]. Note > the octavia_service_auth_project variable which was added to migrate from > the admin project to the service project for octavia resources. We're > lacking proper automation for the flavor, image etc, but it is being worked > on in Victoria [2]. > > [1] > https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html > [2] https://review.opendev.org/740180 > > Michael >> >> On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann >> wrote: >> >> Seems like the flavor is missing or empty '' - check for typos and enable >> debug. >> >> Check if the nova req contains valid information/flavor. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 15:46: >> >> It's registered >> >> Get Outlook for Android >> ------------------------------ >> *From:* Fabian Zimmermann >> *Sent:* Monday, August 3, 2020 7:08:21 PM >> *To:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Did you check the (nova) flavor you use in octavia. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 10:53: >> >> After Michael suggestion I was able to create load balancer but there is >> error in status. >> >> >> >> PFB the error link: >> >> http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Monday, August 3, 2020 2:08 PM >> *To:* Michael Johnson >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Thanks a ton Michael for helping me out >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Friday, July 31, 2020 3:57 AM >> *To:* Monika Samal >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Just to close the loop on this, the octavia.conf file had >> "project_name = admin" instead of "project_name = service" in the >> [service_auth] section. This was causing the keystone errors when >> Octavia was communicating with neutron. >> >> I don't know if that is a bug in kolla-ansible or was just a local >> configuration issue. >> >> Michael >> >> On Thu, Jul 30, 2020 at 1:39 PM Monika Samal >> wrote: >> > >> > Hello Fabian,, >> > >> > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ >> > >> > Regards, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:57 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > Hi, >> > >> > just to debug, could you replace the auth_type password with v3password? >> > >> > And do a curl against your :5000 and :35357 urls and paste the output. >> > >> > Fabian >> > >> > Monika Samal schrieb am Do., 30. Juli 2020, >> 22:15: >> > >> > Hello Fabian, >> > >> > http://paste.openstack.org/show/796477/ >> > >> > Thanks, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:38 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > The sections should be >> > >> > service_auth >> > keystone_authtoken >> > >> > if i read the docs correctly. Maybe you can just paste your config >> (remove/change passwords) to paste.openstack.org and post the link? >> > >> > Fabian >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Wed Aug 5 15:28:52 2020 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Wed, 5 Aug 2020 17:28:52 +0200 Subject: =?utf-8?Q?Re=3A_=5Bkolla=5D_Proposing_Micha=C5=82_Nasiadka_for_ka?= =?utf-8?Q?yobe-core?= In-Reply-To: References: Message-ID: Hi, Thanks for being a part of such a great team! Best regards, Michal > On 5 Aug 2020, at 17:14, Mark Goddard wrote: > > On Tue, 28 Jul 2020 at 16:08, Doug Szumski wrote: >> >> >> On 28/07/2020 15:50, Mark Goddard wrote: >>> Hi, >>> >>> I'd like to propose adding Michał Nasiadka to the kayobe-core group. >>> Michał is a valued member of the Kolla core team, and has been >>> providing some good patches and reviews for Kayobe too. >>> >>> Kayobians, please respond with +1/-1. > It's been a week, with only approvals - welcome to the core team Michał! >> Sounds excellent, +1 for Michał! >>> >>> Cheers, >>> Mark >>> > From jasowang at redhat.com Wed Aug 5 02:22:15 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 5 Aug 2020 10:22:15 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200804183503.39f56516.cohuck@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> Message-ID: On 2020/8/5 上午12:35, Cornelia Huck wrote: > [sorry about not chiming in earlier] > > On Wed, 29 Jul 2020 16:05:03 +0800 > Yan Zhao wrote: > >> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > (...) > >>> Based on the feedback we've received, the previously proposed interface >>> is not viable. I think there's agreement that the user needs to be >>> able to parse and interpret the version information. Using json seems >>> viable, but I don't know if it's the best option. Is there any >>> precedent of markup strings returned via sysfs we could follow? > I don't think encoding complex information in a sysfs file is a viable > approach. Quoting Documentation/filesystems/sysfs.rst: > > "Attributes should be ASCII text files, preferably with only one value > per file. It is noted that it may not be efficient to contain only one > value per file, so it is socially acceptable to express an array of > values of the same type. > > Mixing types, expressing multiple lines of data, and doing fancy > formatting of data is heavily frowned upon." > > Even though this is an older file, I think these restrictions still > apply. +1, that's another reason why devlink(netlink) is better. Thanks From yan.y.zhao at intel.com Wed Aug 5 02:16:54 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 5 Aug 2020 10:16:54 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> Message-ID: <20200805021654.GB30485@joy-OptiPlex-7040> On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: > > On 2020/8/5 上午12:35, Cornelia Huck wrote: > > [sorry about not chiming in earlier] > > > > On Wed, 29 Jul 2020 16:05:03 +0800 > > Yan Zhao wrote: > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > (...) > > > > > > Based on the feedback we've received, the previously proposed interface > > > > is not viable. I think there's agreement that the user needs to be > > > > able to parse and interpret the version information. Using json seems > > > > viable, but I don't know if it's the best option. Is there any > > > > precedent of markup strings returned via sysfs we could follow? > > I don't think encoding complex information in a sysfs file is a viable > > approach. Quoting Documentation/filesystems/sysfs.rst: > > > > "Attributes should be ASCII text files, preferably with only one value > > per file. It is noted that it may not be efficient to contain only one > > value per file, so it is socially acceptable to express an array of > > values of the same type. > > Mixing types, expressing multiple lines of data, and doing fancy > > formatting of data is heavily frowned upon." > > > > Even though this is an older file, I think these restrictions still > > apply. > > > +1, that's another reason why devlink(netlink) is better. > hi Jason, do you have any materials or sample code about devlink, so we can have a good study of it? I found some kernel docs about it but my preliminary study didn't show me the advantage of devlink. Thanks Yan From jasowang at redhat.com Wed Aug 5 02:41:54 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 5 Aug 2020 10:41:54 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200805021654.GB30485@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> Message-ID: <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> On 2020/8/5 上午10:16, Yan Zhao wrote: > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: >> On 2020/8/5 上午12:35, Cornelia Huck wrote: >>> [sorry about not chiming in earlier] >>> >>> On Wed, 29 Jul 2020 16:05:03 +0800 >>> Yan Zhao wrote: >>> >>>> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: >>> (...) >>> >>>>> Based on the feedback we've received, the previously proposed interface >>>>> is not viable. I think there's agreement that the user needs to be >>>>> able to parse and interpret the version information. Using json seems >>>>> viable, but I don't know if it's the best option. Is there any >>>>> precedent of markup strings returned via sysfs we could follow? >>> I don't think encoding complex information in a sysfs file is a viable >>> approach. Quoting Documentation/filesystems/sysfs.rst: >>> >>> "Attributes should be ASCII text files, preferably with only one value >>> per file. It is noted that it may not be efficient to contain only one >>> value per file, so it is socially acceptable to express an array of >>> values of the same type. >>> Mixing types, expressing multiple lines of data, and doing fancy >>> formatting of data is heavily frowned upon." >>> >>> Even though this is an older file, I think these restrictions still >>> apply. >> >> +1, that's another reason why devlink(netlink) is better. >> > hi Jason, > do you have any materials or sample code about devlink, so we can have a good > study of it? > I found some kernel docs about it but my preliminary study didn't show me the > advantage of devlink. CC Jiri and Parav for a better answer for this. My understanding is that the following advantages are obvious (as I replied in another thread): - existing users (NIC, crypto, SCSI, ib), mature and stable - much better error reporting (ext_ack other than string or errno) - namespace aware - do not couple with kobject Thanks > > Thanks > Yan > From jiri at mellanox.com Wed Aug 5 07:56:47 2020 From: jiri at mellanox.com (Jiri Pirko) Date: Wed, 5 Aug 2020 09:56:47 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> References: <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> Message-ID: <20200805075647.GB2177@nanopsycho> Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: > >On 2020/8/5 上午10:16, Yan Zhao wrote: >> On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: >> > On 2020/8/5 上午12:35, Cornelia Huck wrote: >> > > [sorry about not chiming in earlier] >> > > >> > > On Wed, 29 Jul 2020 16:05:03 +0800 >> > > Yan Zhao wrote: >> > > >> > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: >> > > (...) >> > > >> > > > > Based on the feedback we've received, the previously proposed interface >> > > > > is not viable. I think there's agreement that the user needs to be >> > > > > able to parse and interpret the version information. Using json seems >> > > > > viable, but I don't know if it's the best option. Is there any >> > > > > precedent of markup strings returned via sysfs we could follow? >> > > I don't think encoding complex information in a sysfs file is a viable >> > > approach. Quoting Documentation/filesystems/sysfs.rst: >> > > >> > > "Attributes should be ASCII text files, preferably with only one value >> > > per file. It is noted that it may not be efficient to contain only one >> > > value per file, so it is socially acceptable to express an array of >> > > values of the same type. >> > > Mixing types, expressing multiple lines of data, and doing fancy >> > > formatting of data is heavily frowned upon." >> > > >> > > Even though this is an older file, I think these restrictions still >> > > apply. >> > >> > +1, that's another reason why devlink(netlink) is better. >> > >> hi Jason, >> do you have any materials or sample code about devlink, so we can have a good >> study of it? >> I found some kernel docs about it but my preliminary study didn't show me the >> advantage of devlink. > > >CC Jiri and Parav for a better answer for this. > >My understanding is that the following advantages are obvious (as I replied >in another thread): > >- existing users (NIC, crypto, SCSI, ib), mature and stable >- much better error reporting (ext_ack other than string or errno) >- namespace aware >- do not couple with kobject Jason, what is your use case? > >Thanks > > >> >> Thanks >> Yan >> > From jasowang at redhat.com Wed Aug 5 08:02:48 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 5 Aug 2020 16:02:48 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200805075647.GB2177@nanopsycho> References: <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> Message-ID: On 2020/8/5 下午3:56, Jiri Pirko wrote: > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: >> On 2020/8/5 上午10:16, Yan Zhao wrote: >>> On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: >>>> On 2020/8/5 上午12:35, Cornelia Huck wrote: >>>>> [sorry about not chiming in earlier] >>>>> >>>>> On Wed, 29 Jul 2020 16:05:03 +0800 >>>>> Yan Zhao wrote: >>>>> >>>>>> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: >>>>> (...) >>>>> >>>>>>> Based on the feedback we've received, the previously proposed interface >>>>>>> is not viable. I think there's agreement that the user needs to be >>>>>>> able to parse and interpret the version information. Using json seems >>>>>>> viable, but I don't know if it's the best option. Is there any >>>>>>> precedent of markup strings returned via sysfs we could follow? >>>>> I don't think encoding complex information in a sysfs file is a viable >>>>> approach. Quoting Documentation/filesystems/sysfs.rst: >>>>> >>>>> "Attributes should be ASCII text files, preferably with only one value >>>>> per file. It is noted that it may not be efficient to contain only one >>>>> value per file, so it is socially acceptable to express an array of >>>>> values of the same type. >>>>> Mixing types, expressing multiple lines of data, and doing fancy >>>>> formatting of data is heavily frowned upon." >>>>> >>>>> Even though this is an older file, I think these restrictions still >>>>> apply. >>>> +1, that's another reason why devlink(netlink) is better. >>>> >>> hi Jason, >>> do you have any materials or sample code about devlink, so we can have a good >>> study of it? >>> I found some kernel docs about it but my preliminary study didn't show me the >>> advantage of devlink. >> >> CC Jiri and Parav for a better answer for this. >> >> My understanding is that the following advantages are obvious (as I replied >> in another thread): >> >> - existing users (NIC, crypto, SCSI, ib), mature and stable >> - much better error reporting (ext_ack other than string or errno) >> - namespace aware >> - do not couple with kobject > Jason, what is your use case? I think the use case is to report device compatibility for live migration. Yan proposed a simple sysfs based migration version first, but it looks not sufficient and something based on JSON is discussed. Yan, can you help to summarize the discussion so far for Jiri as a reference? Thanks > > > >> Thanks >> >> >>> Thanks >>> Yan >>> From dgilbert at redhat.com Wed Aug 5 09:44:23 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Wed, 5 Aug 2020 10:44:23 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200804083708.GA30485@joy-OptiPlex-7040> References: <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200729131255.68730f68@x1.home> <20200730034104.GB32327@joy-OptiPlex-7040> <20200730112930.6f4c5762@x1.home> <20200804083708.GA30485@joy-OptiPlex-7040> Message-ID: <20200805094423.GB3004@work-vm> * Yan Zhao (yan.y.zhao at intel.com) wrote: > > > yes, include a device_api field is better. > > > for mdev, "device_type=vfio-mdev", is it right? > > > > No, vfio-mdev is not a device API, it's the driver that attaches to the > > mdev bus device to expose it through vfio. The device_api exposes the > > actual interface of the vfio device, it's also vfio-pci for typical > > mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc... See > > VFIO_DEVICE_API_PCI_STRING and friends. > > > ok. got it. > > > > > > > device_id=8086591d > > > > > > > > Is device_id interpreted relative to device_type? How does this > > > > relate to mdev_type? If we have an mdev_type, doesn't that fully > > > > defined the software API? > > > > > > > it's parent pci id for mdev actually. > > > > If we need to specify the parent PCI ID then something is fundamentally > > wrong with the mdev_type. The mdev_type should define a unique, > > software compatible interface, regardless of the parent device IDs. If > > a i915-GVTg_V5_2 means different things based on the parent device IDs, > > then then different mdev_types should be reported for those parent > > devices. > > > hmm, then do we allow vendor specific fields? > or is it a must that a vendor specific field should have corresponding > vendor attribute? > > another thing is that the definition of mdev_type in GVT only corresponds > to vGPU computing ability currently, > e.g. i915-GVTg_V5_2, is 1/2 of a gen9 IGD, i915-GVTg_V4_2 is 1/2 of a > gen8 IGD. > It is too coarse-grained to live migration compatibility. Can you explain why that's too coarse? Is this because it's too specific (i.e. that a i915-GVTg_V4_2 could be migrated to a newer device?), or that it's too specific on the exact sizings (i.e. that there may be multiple different sizes of a gen9)? Dave > Do you think we need to update GVT's definition of mdev_type? > And is there any guide in mdev_type definition? > > > > > > > mdev_type=i915-GVTg_V5_2 > > > > > > > > And how are non-mdev devices represented? > > > > > > > non-mdev can opt to not include this field, or as you said below, a > > > vendor signature. > > > > > > > > > aggregator=1 > > > > > > pv_mode="none+ppgtt+context" > > > > > > > > These are meaningless vendor specific matches afaict. > > > > > > > yes, pv_mode and aggregator are vendor specific fields. > > > but they are important to decide whether two devices are compatible. > > > pv_mode means whether a vGPU supports guest paravirtualized api. > > > "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or > > > use context mode pv. > > > > > > > > > interface_version=3 > > > > > > > > Not much granularity here, I prefer Sean's previous > > > > .[.bugfix] scheme. > > > > > > > yes, .[.bugfix] scheme may be better, but I'm not sure if > > > it works for a complicated scenario. > > > e.g for pv_mode, > > > (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, > > > (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, > > > indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. > > > (3) later, pv_mode=context is also supported, > > > pv_mode="none+ppgtt+context", so it's 0.2.0. > > > > > > But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to > > > name its version? "none+ppgtt" (0.1.0) is not compatible to > > > "none+context", but "none+ppgtt+context" (0.2.0) is compatible to > > > "none+context". > > > > If pv_mode=ppgtt is removed, then the compatible versions would be > > 0.0.0 or 1.0.0, ie. the major version would be incremented due to > > feature removal. > > > > > Maintain such scheme is painful to vendor driver. > > > > Migration compatibility is painful, there's no way around that. I > > think the version scheme is an attempt to push some of that low level > > burden on the vendor driver, otherwise the management tools need to > > work on an ever growing matrix of vendor specific features which is > > going to become unwieldy and is largely meaningless outside of the > > vendor driver. Instead, the vendor driver can make strategic decisions > > about where to continue to maintain a support burden and make explicit > > decisions to maintain or break compatibility. The version scheme is a > > simplification and abstraction of vendor driver features in order to > > create a small, logical compatibility matrix. Compromises necessarily > > need to be made for that to occur. > > > ok. got it. > > > > > > > COMPATIBLE: > > > > > > device_type=pci > > > > > > device_id=8086591d > > > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > > this mixed notation will be hard to parse so i would avoid that. > > > > > > > > Some background, Intel has been proposing aggregation as a solution to > > > > how we scale mdev devices when hardware exposes large numbers of > > > > assignable objects that can be composed in essentially arbitrary ways. > > > > So for instance, if we have a workqueue (wq), we might have an mdev > > > > type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a > > > > discrete mdev type for each of those, so they want to define a base > > > > type which is composable to other types via this aggregation. This is > > > > what this substitution and tagging is attempting to accomplish. So > > > > imagine this set of values for cases where it's not practical to unroll > > > > the values for N discrete types. > > > > > > > > > > aggregator={val1}/2 > > > > > > > > So the {val1} above would be substituted here, though an aggregation > > > > factor of 1/2 is a head scratcher... > > > > > > > > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > > > > > I'm lost on this one though. I think maybe it's indicating that it's > > > > compatible with any of these, so do we need to list it? Couldn't this > > > > be handled by Sean's version proposal where the minor version > > > > represents feature compatibility? > > > yes, it's indicating that it's compatible with any of these. > > > Sean's version proposal may also work, but it would be painful for > > > vendor driver to maintain the versions when multiple similar features > > > are involved. > > > > This is something vendor drivers need to consider when adding and > > removing features. > > > > > > > > interface_version={val3:int:2,3} > > > > > > > > What does this turn into in a few years, 2,7,12,23,75,96,... > > > > > > > is a range better? > > > > I was really trying to point out that sparseness becomes an issue if > > the vendor driver is largely disconnected from how their feature > > addition and deprecation affects migration support. Thanks, > > > ok. we'll use the x.y.z scheme then. > > Thanks > Yan > -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From yan.y.zhao at intel.com Wed Aug 5 09:33:38 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 5 Aug 2020 17:33:38 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> Message-ID: <20200805093338.GC30485@joy-OptiPlex-7040> On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote: > > On 2020/8/5 下午3:56, Jiri Pirko wrote: > > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: > > > On 2020/8/5 上午10:16, Yan Zhao wrote: > > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: > > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote: > > > > > > [sorry about not chiming in earlier] > > > > > > > > > > > > On Wed, 29 Jul 2020 16:05:03 +0800 > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > > > > (...) > > > > > > > > > > > > > > Based on the feedback we've received, the previously proposed interface > > > > > > > > is not viable. I think there's agreement that the user needs to be > > > > > > > > able to parse and interpret the version information. Using json seems > > > > > > > > viable, but I don't know if it's the best option. Is there any > > > > > > > > precedent of markup strings returned via sysfs we could follow? > > > > > > I don't think encoding complex information in a sysfs file is a viable > > > > > > approach. Quoting Documentation/filesystems/sysfs.rst: > > > > > > > > > > > > "Attributes should be ASCII text files, preferably with only one value > > > > > > per file. It is noted that it may not be efficient to contain only one > > > > > > value per file, so it is socially acceptable to express an array of > > > > > > values of the same type. > > > > > > Mixing types, expressing multiple lines of data, and doing fancy > > > > > > formatting of data is heavily frowned upon." > > > > > > > > > > > > Even though this is an older file, I think these restrictions still > > > > > > apply. > > > > > +1, that's another reason why devlink(netlink) is better. > > > > > > > > > hi Jason, > > > > do you have any materials or sample code about devlink, so we can have a good > > > > study of it? > > > > I found some kernel docs about it but my preliminary study didn't show me the > > > > advantage of devlink. > > > > > > CC Jiri and Parav for a better answer for this. > > > > > > My understanding is that the following advantages are obvious (as I replied > > > in another thread): > > > > > > - existing users (NIC, crypto, SCSI, ib), mature and stable > > > - much better error reporting (ext_ack other than string or errno) > > > - namespace aware > > > - do not couple with kobject > > Jason, what is your use case? > > > I think the use case is to report device compatibility for live migration. > Yan proposed a simple sysfs based migration version first, but it looks not > sufficient and something based on JSON is discussed. > > Yan, can you help to summarize the discussion so far for Jiri as a > reference? > yes. we are currently defining an device live migration compatibility interface in order to let user space like openstack and libvirt knows which two devices are live migration compatible. currently the devices include mdev (a kernel emulated virtual device) and physical devices (e.g. a VF of a PCI SRIOV device). the attributes we want user space to compare including common attribues: device_api: vfio-pci, vfio-ccw... mdev_type: mdev type of mdev or similar signature for physical device It specifies a device's hardware capability. e.g. i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics device. software_version: device driver's version. in .[.bugfix] scheme, where there is no compatibility across major versions, minor versions have forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some degree of internal improvement that is not visible to the user in terms of features or compatibility, vendor specific attributes: each vendor may define different attributes device id : device id of a physical devices or mdev's parent pci device. it could be equal to pci id for pci devices aggregator: used together with mdev_type. e.g. aggregator=2 together with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel graphics device. remote_url: for a local NVMe VF, it may be configured with a remote url of a remote storage and all data is stored in the remote side specified by the remote url. ... Comparing those attributes by user space alone is not an easy job, as it can't simply assume an equal relationship between source attributes and target attributes. e.g. for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of gen9), it actually could find a compatible device of mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9), if mdev_type of i915-GVTg_V5_4 is not available in the target machine. So, in our current proposal, we want to create two sysfs attributes under a device sysfs node. /sys//migration/self /sys//migration/compatible #cat /sys//migration/self device_type=vfio_pci mdev_type=i915-GVTg_V5_4 device_id=8086591d aggregator=2 software_version=1.0.0 #cat /sys//migration/compatible device_type=vfio_pci mdev_type=i915-GVTg_V5_{val1:int:2,4,8} device_id=8086591d aggregator={val1}/2 software_version=1.0.0 The /sys//migration/self specifies self attributes of a device. The /sys//migration/compatible specifies the list of compatible devices of a device. as in the example, compatible devices could have device_type == vfio_pci && device_id == 8086591d && software_version == 1.0.0 && ( (mdev_type of i915-GVTg_V5_2 && aggregator==1) || (mdev_type of i915-GVTg_V5_4 && aggregator==2) || (mdev_type of i915-GVTg_V5_8 && aggregator=4) ) by comparing whether a target device is in compatible list of source device, the user space can know whether a two devices are live migration compatible. Additional notes: 1)software_version in the compatible list may not be necessary as it already has a major.minor.bugfix scheme. 2)for vendor attribute like remote_url, it may not be statically assigned and could be changed with a device interface. So, as Cornelia pointed that it's not good to use complex format in a sysfs attribute, we'd like to know whether there're other good ways to our use case, e.g. splitting a single attribute to multiple simple sysfs attributes as what Cornelia suggested or devlink that Jason has strongly recommended. Thanks Yan From jiri at mellanox.com Wed Aug 5 10:53:19 2020 From: jiri at mellanox.com (Jiri Pirko) Date: Wed, 5 Aug 2020 12:53:19 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200805093338.GC30485@joy-OptiPlex-7040> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> Message-ID: <20200805105319.GF2177@nanopsycho> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: >On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote: >> >> On 2020/8/5 下午3:56, Jiri Pirko wrote: >> > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: >> > > On 2020/8/5 上午10:16, Yan Zhao wrote: >> > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: >> > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote: >> > > > > > [sorry about not chiming in earlier] >> > > > > > >> > > > > > On Wed, 29 Jul 2020 16:05:03 +0800 >> > > > > > Yan Zhao wrote: >> > > > > > >> > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: >> > > > > > (...) >> > > > > > >> > > > > > > > Based on the feedback we've received, the previously proposed interface >> > > > > > > > is not viable. I think there's agreement that the user needs to be >> > > > > > > > able to parse and interpret the version information. Using json seems >> > > > > > > > viable, but I don't know if it's the best option. Is there any >> > > > > > > > precedent of markup strings returned via sysfs we could follow? >> > > > > > I don't think encoding complex information in a sysfs file is a viable >> > > > > > approach. Quoting Documentation/filesystems/sysfs.rst: >> > > > > > >> > > > > > "Attributes should be ASCII text files, preferably with only one value >> > > > > > per file. It is noted that it may not be efficient to contain only one >> > > > > > value per file, so it is socially acceptable to express an array of >> > > > > > values of the same type. >> > > > > > Mixing types, expressing multiple lines of data, and doing fancy >> > > > > > formatting of data is heavily frowned upon." >> > > > > > >> > > > > > Even though this is an older file, I think these restrictions still >> > > > > > apply. >> > > > > +1, that's another reason why devlink(netlink) is better. >> > > > > >> > > > hi Jason, >> > > > do you have any materials or sample code about devlink, so we can have a good >> > > > study of it? >> > > > I found some kernel docs about it but my preliminary study didn't show me the >> > > > advantage of devlink. >> > > >> > > CC Jiri and Parav for a better answer for this. >> > > >> > > My understanding is that the following advantages are obvious (as I replied >> > > in another thread): >> > > >> > > - existing users (NIC, crypto, SCSI, ib), mature and stable >> > > - much better error reporting (ext_ack other than string or errno) >> > > - namespace aware >> > > - do not couple with kobject >> > Jason, what is your use case? >> >> >> I think the use case is to report device compatibility for live migration. >> Yan proposed a simple sysfs based migration version first, but it looks not >> sufficient and something based on JSON is discussed. >> >> Yan, can you help to summarize the discussion so far for Jiri as a >> reference? >> >yes. >we are currently defining an device live migration compatibility >interface in order to let user space like openstack and libvirt knows >which two devices are live migration compatible. >currently the devices include mdev (a kernel emulated virtual device) >and physical devices (e.g. a VF of a PCI SRIOV device). > >the attributes we want user space to compare including >common attribues: > device_api: vfio-pci, vfio-ccw... > mdev_type: mdev type of mdev or similar signature for physical device > It specifies a device's hardware capability. e.g. > i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics > device. > software_version: device driver's version. > in .[.bugfix] scheme, where there is no > compatibility across major versions, minor versions have > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > bugfix version number indicates some degree of internal > improvement that is not visible to the user in terms of > features or compatibility, > >vendor specific attributes: each vendor may define different attributes > device id : device id of a physical devices or mdev's parent pci device. > it could be equal to pci id for pci devices > aggregator: used together with mdev_type. e.g. aggregator=2 together > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > graphics device. > remote_url: for a local NVMe VF, it may be configured with a remote > url of a remote storage and all data is stored in the > remote side specified by the remote url. > ... > >Comparing those attributes by user space alone is not an easy job, as it >can't simply assume an equal relationship between source attributes and >target attributes. e.g. >for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of >gen9), it actually could find a compatible device of >mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9), >if mdev_type of i915-GVTg_V5_4 is not available in the target machine. > >So, in our current proposal, we want to create two sysfs attributes >under a device sysfs node. >/sys//migration/self >/sys//migration/compatible > >#cat /sys//migration/self >device_type=vfio_pci >mdev_type=i915-GVTg_V5_4 >device_id=8086591d >aggregator=2 >software_version=1.0.0 > >#cat /sys//migration/compatible >device_type=vfio_pci >mdev_type=i915-GVTg_V5_{val1:int:2,4,8} >device_id=8086591d >aggregator={val1}/2 >software_version=1.0.0 > >The /sys//migration/self specifies self attributes of >a device. >The /sys//migration/compatible specifies the list of >compatible devices of a device. as in the example, compatible devices >could have > device_type == vfio_pci && > device_id == 8086591d && > software_version == 1.0.0 && > ( > (mdev_type of i915-GVTg_V5_2 && aggregator==1) || > (mdev_type of i915-GVTg_V5_4 && aggregator==2) || > (mdev_type of i915-GVTg_V5_8 && aggregator=4) > ) > >by comparing whether a target device is in compatible list of source >device, the user space can know whether a two devices are live migration >compatible. > >Additional notes: >1)software_version in the compatible list may not be necessary as it >already has a major.minor.bugfix scheme. >2)for vendor attribute like remote_url, it may not be statically >assigned and could be changed with a device interface. > >So, as Cornelia pointed that it's not good to use complex format in >a sysfs attribute, we'd like to know whether there're other good ways to >our use case, e.g. splitting a single attribute to multiple simple sysfs >attributes as what Cornelia suggested or devlink that Jason has strongly >recommended. Hi Yan. Thanks for the explanation, I'm still fuzzy about the details. Anyway, I suggest you to check "devlink dev info" command we have implemented for multiple drivers. You can try netdevsim to test this. I think that the info you need to expose might be put there. Devlink creates instance per-device. Specific device driver calls into devlink core to create the instance. What device do you have? What driver is it handled by? > >Thanks >Yan > > > From smooney at redhat.com Wed Aug 5 11:35:01 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Aug 2020 12:35:01 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200805105319.GF2177@nanopsycho> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> Message-ID: <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote: > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: > > On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote: > > > > > > On 2020/8/5 下午3:56, Jiri Pirko wrote: > > > > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: > > > > > On 2020/8/5 上午10:16, Yan Zhao wrote: > > > > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: > > > > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote: > > > > > > > > [sorry about not chiming in earlier] > > > > > > > > > > > > > > > > On Wed, 29 Jul 2020 16:05:03 +0800 > > > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > > > > > > > > > > > > > > (...) > > > > > > > > > > > > > > > > > > Based on the feedback we've received, the previously proposed interface > > > > > > > > > > is not viable. I think there's agreement that the user needs to be > > > > > > > > > > able to parse and interpret the version information. Using json seems > > > > > > > > > > viable, but I don't know if it's the best option. Is there any > > > > > > > > > > precedent of markup strings returned via sysfs we could follow? > > > > > > > > > > > > > > > > I don't think encoding complex information in a sysfs file is a viable > > > > > > > > approach. Quoting Documentation/filesystems/sysfs.rst: > > > > > > > > > > > > > > > > "Attributes should be ASCII text files, preferably with only one value > > > > > > > > per file. It is noted that it may not be efficient to contain only one > > > > > > > > value per file, so it is socially acceptable to express an array of > > > > > > > > values of the same type. > > > > > > > > Mixing types, expressing multiple lines of data, and doing fancy > > > > > > > > formatting of data is heavily frowned upon." > > > > > > > > > > > > > > > > Even though this is an older file, I think these restrictions still > > > > > > > > apply. > > > > > > > > > > > > > > +1, that's another reason why devlink(netlink) is better. > > > > > > > > > > > > > > > > > > > hi Jason, > > > > > > do you have any materials or sample code about devlink, so we can have a good > > > > > > study of it? > > > > > > I found some kernel docs about it but my preliminary study didn't show me the > > > > > > advantage of devlink. > > > > > > > > > > CC Jiri and Parav for a better answer for this. > > > > > > > > > > My understanding is that the following advantages are obvious (as I replied > > > > > in another thread): > > > > > > > > > > - existing users (NIC, crypto, SCSI, ib), mature and stable > > > > > - much better error reporting (ext_ack other than string or errno) > > > > > - namespace aware > > > > > - do not couple with kobject > > > > > > > > Jason, what is your use case? > > > > > > > > > I think the use case is to report device compatibility for live migration. > > > Yan proposed a simple sysfs based migration version first, but it looks not > > > sufficient and something based on JSON is discussed. > > > > > > Yan, can you help to summarize the discussion so far for Jiri as a > > > reference? > > > > > > > yes. > > we are currently defining an device live migration compatibility > > interface in order to let user space like openstack and libvirt knows > > which two devices are live migration compatible. > > currently the devices include mdev (a kernel emulated virtual device) > > and physical devices (e.g. a VF of a PCI SRIOV device). > > > > the attributes we want user space to compare including > > common attribues: > > device_api: vfio-pci, vfio-ccw... > > mdev_type: mdev type of mdev or similar signature for physical device > > It specifies a device's hardware capability. e.g. > > i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics > > device. by the way this nameing sceam works the opisite of how it would have expected i woudl have expected to i915-GVTg_V5 to be the same as i915-GVTg_V5_1 and i915-GVTg_V5_4 to use 4 times the amount of resouce as i915-GVTg_V5_1 not 1 quarter. i would much rather see i915-GVTg_V5_4 express as aggreataor:i915-GVTg_V5=4 e.g. that it is 4 of the basic i915-GVTg_V5 type the invertion of the relationship makes this much harder to resonabout IMO. if i915-GVTg_V5_8 and i915-GVTg_V5_4 are both actully claiming the same resouce and both can be used at the same time with your suggested nameing scemem i have have to fine the mdevtype with the largest value and store that then do math by devidign it by the suffix of the requested type every time i want to claim the resouce in our placement inventoies. if we represent it the way i suggest we dont if it i915-GVTg_V5_8 i know its using 8 of i915-GVTg_V5 it makes it significantly simpler. > > software_version: device driver's version. > > in .[.bugfix] scheme, where there is no > > compatibility across major versions, minor versions have > > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > > bugfix version number indicates some degree of internal > > improvement that is not visible to the user in terms of > > features or compatibility, > > > > vendor specific attributes: each vendor may define different attributes > > device id : device id of a physical devices or mdev's parent pci device. > > it could be equal to pci id for pci devices > > aggregator: used together with mdev_type. e.g. aggregator=2 together > > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > > graphics device. > > remote_url: for a local NVMe VF, it may be configured with a remote > > url of a remote storage and all data is stored in the > > remote side specified by the remote url. > > ... just a minor not that i find ^ much more simmple to understand then the current proposal with self and compatiable. if i have well defiend attibute that i can parse and understand that allow me to calulate the what is and is not compatible that is likely going to more useful as you wont have to keep maintianing a list of other compatible devices every time a new sku is released. in anycase thank for actully shareing ^ as it make it simpler to reson about what you have previously proposed. > > > > Comparing those attributes by user space alone is not an easy job, as it > > can't simply assume an equal relationship between source attributes and > > target attributes. e.g. > > for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of > > gen9), it actually could find a compatible device of > > mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9), > > if mdev_type of i915-GVTg_V5_4 is not available in the target machine. > > > > So, in our current proposal, we want to create two sysfs attributes > > under a device sysfs node. > > /sys//migration/self > > /sys//migration/compatible > > > > #cat /sys//migration/self > > device_type=vfio_pci > > mdev_type=i915-GVTg_V5_4 > > device_id=8086591d > > aggregator=2 > > software_version=1.0.0 > > > > #cat /sys//migration/compatible > > device_type=vfio_pci > > mdev_type=i915-GVTg_V5_{val1:int:2,4,8} > > device_id=8086591d > > aggregator={val1}/2 > > software_version=1.0.0 > > > > The /sys//migration/self specifies self attributes of > > a device. > > The /sys//migration/compatible specifies the list of > > compatible devices of a device. as in the example, compatible devices > > could have > > device_type == vfio_pci && > > device_id == 8086591d && > > software_version == 1.0.0 && > > ( > > (mdev_type of i915-GVTg_V5_2 && aggregator==1) || > > (mdev_type of i915-GVTg_V5_4 && aggregator==2) || > > (mdev_type of i915-GVTg_V5_8 && aggregator=4) > > ) > > > > by comparing whether a target device is in compatible list of source > > device, the user space can know whether a two devices are live migration > > compatible. > > > > Additional notes: > > 1)software_version in the compatible list may not be necessary as it > > already has a major.minor.bugfix scheme. > > 2)for vendor attribute like remote_url, it may not be statically > > assigned and could be changed with a device interface. > > > > So, as Cornelia pointed that it's not good to use complex format in > > a sysfs attribute, we'd like to know whether there're other good ways to > > our use case, e.g. splitting a single attribute to multiple simple sysfs > > attributes as what Cornelia suggested or devlink that Jason has strongly > > recommended. > > Hi Yan. > > Thanks for the explanation, I'm still fuzzy about the details. > Anyway, I suggest you to check "devlink dev info" command we have > implemented for multiple drivers. is devlink exposed as a filesytem we can read with just open? openstack will likely try to leverage libvirt to get this info but when we cant its much simpler to read sysfs then it is to take a a depenency on a commandline too and have to fork shell to execute it and parse the cli output. pyroute2 which we use in some openstack poject has basic python binding for devlink but im not sure how complete it is as i think its relitivly new addtion. if we need to take a dependcy we will but that would be a drawback fo devlink not that that is a large one just something to keep in mind. > You can try netdevsim to test this. > I think that the info you need to expose might be put there. > > Devlink creates instance per-device. Specific device driver calls into > devlink core to create the instance. What device do you have? What > driver is it handled by? > > > > > > Thanks > > Yan > > > > > > > > From whayutin at redhat.com Wed Aug 5 16:23:46 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 5 Aug 2020 10:23:46 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: > > > On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: > >> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin >> wrote: >> > >> > >> > >> > On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya >> wrote: >> >> >> >> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >> >> > >> >> > >> >> > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi > >> > > wrote: >> >> > >> >> > >> >> > >> >> > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz < >> aschultz at redhat.com >> >> > > wrote: >> >> > >> >> > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >> >> > > wrote: >> >> > > >> >> > > >> >> > > >> >> > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >> >> > > wrote: >> >> > >> >> >> > >> FYI... >> >> > >> >> >> > >> If you find your jobs are failing with an error similar >> to >> >> > [1], you have been rate limited by docker.io < >> http://docker.io> >> >> > via the upstream mirror system and have hit [2]. I've been >> >> > discussing the issue w/ upstream infra, rdo-infra and a few >> CI >> >> > engineers. >> >> > >> >> >> > >> There are a few ways to mitigate the issue however I >> don't >> >> > see any of the options being completed very quickly so I'm >> >> > asking for your patience while this issue is socialized and >> >> > resolved. >> >> > >> >> >> > >> For full transparency we're considering the following >> options. >> >> > >> >> >> > >> 1. move off of docker.io to quay.io >> >> > >> >> > > >> >> > > >> >> > > quay.io also has API rate limit: >> >> > > https://docs.quay.io/issues/429.html >> >> > > >> >> > > Now I'm not sure about how many requests per seconds one >> can >> >> > do vs the other but this would need to be checked with the >> quay >> >> > team before changing anything. >> >> > > Also quay.io had its big downtimes as >> well, >> >> > SLA needs to be considered. >> >> > > >> >> > >> 2. local container builds for each job in master, >> possibly >> >> > ussuri >> >> > > >> >> > > >> >> > > Not convinced. >> >> > > You can look at CI logs: >> >> > > - pulling / updating / pushing container images from >> >> > docker.io to local registry takes ~10 >> min on >> >> > standalone (OVH) >> >> > > - building containers from scratch with updated repos and >> >> > pushing them to local registry takes ~29 min on standalone >> (OVH). >> >> > > >> >> > >> >> >> > >> 3. parent child jobs upstream where rpms and containers >> will >> >> > be build and host artifacts for the child jobs >> >> > > >> >> > > >> >> > > Yes, we need to investigate that. >> >> > > >> >> > >> >> >> > >> 4. remove some portion of the upstream jobs to lower the >> >> > impact we have on 3rd party infrastructure. >> >> > > >> >> > > >> >> > > I'm not sure I understand this one, maybe you can give an >> >> > example of what could be removed? >> >> > >> >> > We need to re-evaulate our use of scenarios (e.g. we have two >> >> > scenario010's both are non-voting). There's a reason we >> >> > historically >> >> > didn't want to add more jobs because of these types of >> resource >> >> > constraints. I think we've added new jobs recently and >> likely >> >> > need to >> >> > reduce what we run. Additionally we might want to look into >> reducing >> >> > what we run on stable branches as well. >> >> > >> >> > >> >> > Oh... removing jobs (I thought we would remove some steps of the >> jobs). >> >> > Yes big +1, this should be a continuous goal when working on CI, >> and >> >> > always evaluating what we need vs what we run now. >> >> > >> >> > We should look at: >> >> > 1) services deployed in scenarios that aren't worth testing (e.g. >> >> > deprecated or unused things) (and deprecate the unused things) >> >> > 2) jobs themselves (I don't have any example beside scenario010 >> but >> >> > I'm sure there are more). >> >> > -- >> >> > Emilien Macchi >> >> > >> >> > >> >> > Thanks Alex, Emilien >> >> > >> >> > +1 to reviewing the catalog and adjusting things on an ongoing basis. >> >> > >> >> > All.. it looks like the issues with docker.io >> were >> >> > more of a flare up than a change in docker.io >> policy >> >> > or infrastructure [2]. The flare up started on July 27 8am utc and >> >> > ended on July 27 17:00 utc, see screenshots. >> >> >> >> The numbers of image prepare workers and its exponential fallback >> >> intervals should be also adjusted. I've analysed the log snippet [0] >> for >> >> the connection reset counts by workers versus the times the rate >> >> limiting was triggered. See the details in the reported bug [1]. >> >> >> >> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >> >> >> >> Conn Reset Counts by a Worker PID: >> >> 3 58412 >> >> 2 58413 >> >> 3 58415 >> >> 3 58417 >> >> >> >> which seems too much of (workers*reconnects) and triggers rate limiting >> >> immediately. >> >> >> >> [0] >> >> >> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >> >> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >> >> >> >> -- >> >> Best regards, >> >> Bogdan Dobrelya, >> >> Irc #bogdando >> >> >> > >> > FYI.. >> > >> > The issue w/ "too many requests" is back. Expect delays and failures >> in attempting to merge your patches upstream across all branches. The >> issue is being tracked as a critical issue. >> >> Working with the infra folks and we have identified the authorization >> header as causing issues when we're rediected from docker.io to >> cloudflare. I'll throw up a patch tomorrow to handle this case which >> should improve our usage of the cache. It needs some testing against >> other registries to ensure that we don't break authenticated fetching >> of resources. >> >> Thanks Alex! > FYI.. we have been revisited by the container pull issue, "too many requests". Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 expect trouble in check and gate: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yumeng_bao at yahoo.com Wed Aug 5 17:06:03 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Thu, 6 Aug 2020 01:06:03 +0800 Subject: [nova] If any spec freeze exception now? References: Message-ID: Hi gibi and all, I wanna mention the SRIOV SmartNIC Support Spec https://review.opendev.org/#/c/742785 This spec is proposed based on feedback from our PTG discussion, yet there are still open questions need to be nailed down. Since this spec involves nova neutron and cyborg, it will probably take a long time to get ideas from different aspects and reach an agreement. Can we keep this as an exception and keep review it to reach closer to an agreement? Hopefully we can reach an agreement in Victoria, and start to land in W. Xinran and I were trying to attend nova’s weekly meeting to discuss this spec, but the time too late for us. :( We will find if there is any other way to sync and response more actively to all your comments and feedback. And just to point out, nova operations support are still one of cyborg’s high priority goals in Victoria, we will keep focus on it and won’t sacrifice time of this goal. Regards, Yumeng -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Wed Aug 5 17:10:28 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 5 Aug 2020 12:10:28 -0500 Subject: [cinder] Change Volume Type Properties In-Reply-To: <24E5E9E3-6BF6-492C-BBBB-670DC070CF15@gmx.net> References: <24E5E9E3-6BF6-492C-BBBB-670DC070CF15@gmx.net> Message-ID: (updated subject with correct project name) On 8/5/20 9:33 AM, Marc Vorwerk wrote: > > Hi, > > I'm looking fora way to add the property /volume_backend_name/ to an > existing Volume Type which is in use. > > If I try to change this, I got the following error: > > root at control01rt:~# openstack volume type show test-type > > +--------------------+--------------------------------------+ > > | Field              | Value                                | > > +--------------------+--------------------------------------+ > > | access_project_ids | None                                 | > > | description        | None                                 | > > | id                 | 68febdad-e7b1-4d41-ba11-72d0e1a1cce0 | > > | is_public          | True                                 | > > | name               | test-type                            | > > | properties |                                      | > > | qos_specs_id       | None                         | > > +--------------------+--------------------------------------+ > > root at control01rt:~# openstack volume type set --property > volume_backend_name=ceph test-type > > Failed to set volume type property: Volume Type is currently in use. > (HTTP 400) (Request-ID: req-2b8f3829-5c16-42c3-ac57-01199688bd58) > > Command Failed: One or more of the operations failed > > root at control01rt:~# > > Problem what I see is, that there are instances/volumes which use this > volume type. > > Have anybody an idea, how I can add the /volume_backend_name/ property > to the existing Volume Type? > This is not allowed since the scheduler may have already scheduled these volumes to a different backend than the one you are now specifying in the extra specs. That would lead to a mismatch between the volumes and their volume type that isn't obvious. To get around this, you will need to create a new volume type with the volume_backend_name you want specified first. You can then retype your existing volumes to this new volume type. Assuming most or all of these volumes are already on that backend, the retype operation should just be a quick database update. If needed, you can then delete the original volume type that is no longer being used, then rename the new volume type to get back to using the same type name. This part isn't necessary, but you may need that if you've configured the old name as the default volume type in your cinder.conf file. Hope that helps. Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 5 17:18:03 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Aug 2020 18:18:03 +0100 Subject: [nova] If any spec freeze exception now? In-Reply-To: References: Message-ID: <3c98b824f04d8fa6fd07addd956a097db1aa20ba.camel@redhat.com> On Thu, 2020-08-06 at 01:06 +0800, yumeng bao wrote: > Hi gibi and all, > > I wanna mention the SRIOV SmartNIC Support Spec https://review.opendev.org/#/c/742785 > > This spec is proposed based on feedback from our PTG discussion, yet there are still open questions need to be nailed > down. Since this spec involves nova neutron and cyborg, it will probably take a long time to get ideas from different > aspects and reach an agreement. Can we keep this as an exception and keep review it to reach closer to an agreement? > Hopefully we can reach an agreement in Victoria, and start to land in W. well you dont need to close it without an exception the way exception work we normlly give a dealin of 1 week to finalise the spec after its granted so basiclaly unless you think we can fully agreee all the outstanding items before thursday week and merge it then you should just retarget the spec to the backlog or W release and keep working on it rather then ask for an excption. exception are only for thing that we expect to merge in victoria including the code. > > Xinran and I were trying to attend nova’s weekly meeting to discuss this spec, but the time too late for us. :( We > will find if there is any other way to sync and response more actively to all your comments and feedback. > > And just to point out, nova operations support are still one of cyborg’s high priority goals in Victoria, we will > keep focus on it and won’t sacrifice time of this goal. > > > Regards, > Yumeng From balazs.gibizer at est.tech Wed Aug 5 17:31:44 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Wed, 05 Aug 2020 19:31:44 +0200 Subject: [nova] If any spec freeze exception now? In-Reply-To: <3c98b824f04d8fa6fd07addd956a097db1aa20ba.camel@redhat.com> References: <3c98b824f04d8fa6fd07addd956a097db1aa20ba.camel@redhat.com> Message-ID: On Wed, Aug 5, 2020 at 18:18, Sean Mooney wrote: > On Thu, 2020-08-06 at 01:06 +0800, yumeng bao wrote: >> Hi gibi and all, >> >> I wanna mention the SRIOV SmartNIC Support Spec >> https://review.opendev.org/#/c/742785 >> >> This spec is proposed based on feedback from our PTG discussion, >> yet there are still open questions need to be nailed >> down. Since this spec involves nova neutron and cyborg, it will >> probably take a long time to get ideas from different >> aspects and reach an agreement. Can we keep this as an exception >> and keep review it to reach closer to an agreement? >> Hopefully we can reach an agreement in Victoria, and start to land >> in W. > well you dont need to close it without an exception > the way exception work we normlly give a dealin of 1 week to finalise > the spec after its granted > so basiclaly unless you think we can fully agreee all the outstanding > items before thursday week and merge it > then you should just retarget the spec to the backlog or W release > and keep working on it rather then ask for an > excption. exception are only for thing that we expect to merge in > victoria including the code. Agree with Sean. No need for an exception to continue discussing the spec during the V cycle. Having the spec freeze only means that now we know that the SmartNIC spec is not going to be implemented in V. Cheers, gibi >> >> Xinran and I were trying to attend nova’s weekly meeting to >> discuss this spec, but the time too late for us. :( We >> will find if there is any other way to sync and response more >> actively to all your comments and feedback. >> >> And just to point out, nova operations support are still one of >> cyborg’s high priority goals in Victoria, we will >> keep focus on it and won’t sacrifice time of this goal. >> >> >> Regards, >> Yumeng > From gouthampravi at gmail.com Wed Aug 5 18:54:41 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 5 Aug 2020 11:54:41 -0700 Subject: [manila] Doc-a-thon event coming up next Thursday (Aug 6th) In-Reply-To: References: Message-ID: Thank you so much for putting this together Victoria, and Vida! As a reminder, we will not be meeting on IRC tomorrow (6th August 2020), but instead will be in https://meetpad.opendev.org/ManilaV-ReleaseDocAThon You can get to the etherpad link for the meeting by going to etherpad.opendev.org instead of meetpad.opendev.org: https://etherpad.opendev.org/p/ManilaV-ReleaseDocAThon Please bring any documentation issues to that meeting Hoping to see you all there! On Mon, Aug 3, 2020 at 12:20 PM Victoria Martínez de la Cruz < victoria at vmartinezdelacruz.com> wrote: > Hi everybody, > > An update on this. We decided to take over the upstream meeting directly > and start *at* the slot of the Manila weekly meeting. We will join the > Jitsi bridge [0] at 3pm UTC time and start going through the list of bugs > we have in [1]. There is no finish time, you can join and leave the bridge > freely. We will also use IRC Freenode channel #openstack-manila if needed. > > If the time slot doesn't work for you (we are aware this is not a friendly > slot for EMEA/APAC), you can still go through the bug list in [1], claim a > bug and work on it. > > If things go well, we plan to do this again in a different slot so > everybody that wants to collaborate can do it. > > Looking forward to see you there, > > Cheers, > > V > > [0] https://meetpad.opendev.org/ManilaV-ReleaseDocAThon > [1] https://ethercalc.openstack.org/ur17jprbprxx > > On Fri, Jul 31, 2020 at 2:05 PM Victoria Martínez de la Cruz < > victoria at vmartinezdelacruz.com> wrote: > >> Hi folks, >> >> We will be organizing a doc-a-thon next Thursday, August 6th, with the >> main goal of improving our docs for the next release. We will be gathering >> on our Freenode channel #openstack-manila after our weekly meeting (3pm >> UTC) and also using a videoconference tool (exact details TBC) to go over a >> curated list of opened doc bugs we have here [0]. >> >> *Your* participation is truly valued, being you an already Manila >> contributor or if you are interested in contributing and you didn't know >> how, so looking forward to seeing you there :) >> >> Cheers, >> >> Victoria >> >> [0] https://ethercalc.openstack.org/ur17jprbprxx >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Aug 5 19:45:28 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 5 Aug 2020 15:45:28 -0400 Subject: [tc] monthly meeting Message-ID: Hi everyone, Here’s the agenda for our monthly TC meeting. It will happen tomorrow (Thursday the 6th) at 1400 UTC in #openstack-tc and I will be your chair. If you can’t attend, please put your name in the “Apologies for Absence” section. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * ACTIVE INITIATIVES - Follow up on past action items - OpenStack User-facing APIs and CLIs (belmoreira) - W cycle goal selection start - Completion of retirement cleanup (gmann) https://etherpad.opendev.org/p/tc-retirement-cleanup Thank you, Mohammed -- Mohammed Naser VEXXHOST, Inc. From skaplons at redhat.com Wed Aug 5 20:26:01 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 5 Aug 2020 22:26:01 +0200 Subject: [neutron] Drivers team meetings 7.08.2020 and 14.08.2020 Message-ID: Hi, I have internal training this Friday and I will be on PTO next Friday. Because of that I will not be able to chair neutron drivers meetings those 2 weeks. Currently we don’t have any new RFES to discuss so lets cancel those meetings and focus on implementation of RFEs already accepted. See You on drivers meeting on Friday, 21.08.2020 — Slawek Kaplonski Principal software engineer Red Hat From nate.johnston at redhat.com Wed Aug 5 22:02:23 2020 From: nate.johnston at redhat.com (Nate Johnston) Date: Wed, 5 Aug 2020 18:02:23 -0400 Subject: [tc][ptl] Proposal: Distributed Project Leadership Message-ID: <20200805220223.2elo2nrauzr575al@firewall> The governing structure for OpenStack projects has long been for a Project Technical Lead (PTL) to be elected to serve as a singular focus for that project. While the PTL role varies significantly from project to project, the PTL has many responsibilities for managing the development and release process for a project as well as representing the project both internally and externally. There have been a number of projects that have expressed an interest in functioning in a mode that would not require a PTL, but would rather devolve the responsibilities of the PTL into various liaison roles, that could in turn be held by one or more contributors. This topic was discussed by the TC at the recent virtual PTG, and we now have a proposal to put forth to the community for comment. Jean-Phillipe Evrard and I worked up a more detailed proposal for everyone to comment on and review. We are calling this the 'distributed project leadership model'. Most importantly, this is an opt-in process where projects interested in pursuing a distributed project leadership model should opt in to it, but for projects satisfied with the status quo nothing would change. I encourage everyone who is interested to examine the proposal and comment: https://review.opendev.org/744995 Thank you, Nate Johnston From jasonanderson at uchicago.edu Wed Aug 5 23:18:28 2020 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 5 Aug 2020 23:18:28 +0000 Subject: [swift][ceph] Container ACLs don't seem to be respected on Ceph RGW In-Reply-To: <757BCAB6-CA22-439E-9C0C-BE4DEC7B7927@uchicago.edu> References: <757BCAB6-CA22-439E-9C0C-BE4DEC7B7927@uchicago.edu> Message-ID: <48C1BF75-211F-4F7C-ABCA-D59777C469A8@uchicago.edu> As an update, I think one of my problems was the dangling space after “_member_” in my ACL list, which was quite painful to discover. I think it was breaking the matching of my user, which had the role _member_ assigned. And, it does look like read ACLs must be of the form “.r:*”, despite the Ceph docs. With this in place, public read ACL works. I still can’t get write ACLs to work though, and from looking at the code[1] I’m not sure how it’s supposed to work. /Jason [1]: https://github.com/ceph/ceph/blob/f52fb99f011d9b124ed91f3d001d3551e9a10c8d/src/rgw/rgw_acl_swift.cc > On Aug 4, 2020, at 10:49 PM, Jason Anderson wrote: > > Hi all, > > Just scratching my head at this for a while and though I’d ask here in case it saves some time. I’m running a Ceph cluster on the Nautilus release and it’s running Swift via the rgw. I have Keystone authentication turned on. Everything works fine in the normal case of creating containers, uploading files, listing containers, etc. > > However, I notice that ACLs don’t seem to work. I am not overriding "rgw enforce swift acls”, so it is set to the default of true. I can’t seem to share a container or make it public. > > (Side note, confusingly, the Ceph implementation has a different syntax for public read/write containers, ‘*’ as opposed to ‘*:*’ for public write for example.) > > Here’s what I’m doing > > (as admin) > swift post —write-acl ‘*’ —read-acl ‘*’ public-container > swift stat public-container > Account: v1 > Container: public-container > Objects: 1 > Bytes: 5801 > Read ACL: * > Write ACL: * > Sync To: > Sync Key: > X-Timestamp: 1595883106.23179 > X-Container-Bytes-Used-Actual: 8192 > X-Storage-Policy: default-placement > X-Storage-Class: STANDARD > Last-Modified: Wed, 05 Aug 2020 03:42:11 GMT > X-Trans-Id: tx000000000000000662156-005f2a2bea-23478-default > X-Openstack-Request-Id: tx000000000000000662156-005f2a2bea-23478-default > Accept-Ranges: bytes > Content-Type: text/plain; charset=utf-8 > > (as non-admin) > swift upload public-container test.txt > Warning: failed to create container 'public-container': 409 Conflict: BucketAlreadyExists > Object HEAD failed: https://ceph.example.org:7480/swift/v1/public-container/README.md 403 Forbidden > > swift list public-container > Container GET failed: https://ceph.example.org:7480/swift/v1/public-container?format=json 403 Forbidden [first 60 chars of response] b'{"Code":"AccessDenied","BucketName”:”public-container","RequestId":"tx0' > Failed Transaction ID: tx000000000000000662162-005f2a2c2a-23478-default > > What am I missing? Thanks in advance! > > /Jason From jasonanderson at uchicago.edu Wed Aug 5 23:19:53 2020 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 5 Aug 2020 23:19:53 +0000 Subject: [swift][ceph] Container ACLs don't seem to be respected on Ceph RGW In-Reply-To: <48C1BF75-211F-4F7C-ABCA-D59777C469A8@uchicago.edu> References: <757BCAB6-CA22-439E-9C0C-BE4DEC7B7927@uchicago.edu> <48C1BF75-211F-4F7C-ABCA-D59777C469A8@uchicago.edu> Message-ID: On Aug 5, 2020, at 6:18 PM, Jason Anderson > wrote: As an update, I think one of my problems was the dangling space after “_member_” in my ACL list, which was quite painful to discover. I think it was breaking the matching of my user, which had the role _member_ assigned. Sorry, I meant in my Ceph configuration, which had this line in the rgw section: rgw keystone accepted roles = _member_ , Member, admin And, it does look like read ACLs must be of the form “.r:*”, despite the Ceph docs. With this in place, public read ACL works. I still can’t get write ACLs to work though, and from looking at the code[1] I’m not sure how it’s supposed to work. /Jason [1]: https://github.com/ceph/ceph/blob/f52fb99f011d9b124ed91f3d001d3551e9a10c8d/src/rgw/rgw_acl_swift.cc On Aug 4, 2020, at 10:49 PM, Jason Anderson > wrote: Hi all, Just scratching my head at this for a while and though I’d ask here in case it saves some time. I’m running a Ceph cluster on the Nautilus release and it’s running Swift via the rgw. I have Keystone authentication turned on. Everything works fine in the normal case of creating containers, uploading files, listing containers, etc. However, I notice that ACLs don’t seem to work. I am not overriding "rgw enforce swift acls”, so it is set to the default of true. I can’t seem to share a container or make it public. (Side note, confusingly, the Ceph implementation has a different syntax for public read/write containers, ‘*’ as opposed to ‘*:*’ for public write for example.) Here’s what I’m doing (as admin) swift post —write-acl ‘*’ —read-acl ‘*’ public-container swift stat public-container Account: v1 Container: public-container Objects: 1 Bytes: 5801 Read ACL: * Write ACL: * Sync To: Sync Key: X-Timestamp: 1595883106.23179 X-Container-Bytes-Used-Actual: 8192 X-Storage-Policy: default-placement X-Storage-Class: STANDARD Last-Modified: Wed, 05 Aug 2020 03:42:11 GMT X-Trans-Id: tx000000000000000662156-005f2a2bea-23478-default X-Openstack-Request-Id: tx000000000000000662156-005f2a2bea-23478-default Accept-Ranges: bytes Content-Type: text/plain; charset=utf-8 (as non-admin) swift upload public-container test.txt Warning: failed to create container 'public-container': 409 Conflict: BucketAlreadyExists Object HEAD failed: https://ceph.example.org:7480/swift/v1/public-container/README.md 403 Forbidden swift list public-container Container GET failed: https://ceph.example.org:7480/swift/v1/public-container?format=json 403 Forbidden [first 60 chars of response] b'{"Code":"AccessDenied","BucketName”:”public-container","RequestId":"tx0' Failed Transaction ID: tx000000000000000662162-005f2a2c2a-23478-default What am I missing? Thanks in advance! /Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From yumeng_bao at yahoo.com Thu Aug 6 02:11:19 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Thu, 6 Aug 2020 10:11:19 +0800 Subject: [nova] If any spec freeze exception now? In-Reply-To: References: Message-ID: <681905FE-A165-42B2-9D67-0AF259F62E5A@yahoo.com> Ok. That make sense to me! Thanks gibi and Sean! Regards, Yumeng > On Aug 6, 2020, at 1:31 AM, Balázs Gibizer wrote: > >  > >> On Wed, Aug 5, 2020 at 18:18, Sean Mooney wrote: >>> On Thu, 2020-08-06 at 01:06 +0800, yumeng bao wrote: >>> Hi gibi and all, >>> I wanna mention the SRIOV SmartNIC Support Spec https://review.opendev.org/#/c/742785 >>> This spec is proposed based on feedback from our PTG discussion, yet there are still open questions need to be nailed >>> down. Since this spec involves nova neutron and cyborg, it will probably take a long time to get ideas from different >>> aspects and reach an agreement. Can we keep this as an exception and keep review it to reach closer to an agreement? >>> Hopefully we can reach an agreement in Victoria, and start to land in W. >> well you dont need to close it without an exception >> the way exception work we normlly give a dealin of 1 week to finalise the spec after its granted >> so basiclaly unless you think we can fully agreee all the outstanding items before thursday week and merge it >> then you should just retarget the spec to the backlog or W release and keep working on it rather then ask for an >> excption. exception are only for thing that we expect to merge in victoria including the code. > > Agree with Sean. No need for an exception to continue discussing the spec during the V cycle. Having the spec freeze only means that now we know that the SmartNIC spec is not going to be implemented in V. > > Cheers, > gibi > >>> Xinran and I were trying to attend nova’s weekly meeting to discuss this spec, but the time too late for us. :( We >>> will find if there is any other way to sync and response more actively to all your comments and feedback. >>> And just to point out, nova operations support are still one of cyborg’s high priority goals in Victoria, we will >>> keep focus on it and won’t sacrifice time of this goal. >>> Regards, >>> Yumeng > > From emiller at genesishosting.com Thu Aug 6 04:28:28 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 23:28:28 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481447A@gmsxchsvr01.thecreation.com> > Do you need full host block devices to be provided to the instance? No - a thin-provisioned LV in LVM would be best. > The LVM imagebackend will just provision LVs on top of the provided VG so > there's no direct mapping to a full host block device with this approach. That's perfect! > Yeah that's a common pitfall when using LVM based ephemeral disks that > contain additional LVM PVs/VGs/LVs etc. You need to ensure that the host is > configured to not scan these LVs in order for their PVs/VGs/LVs etc to remain > hidden from the host: Thanks for the link! I will let everyone know how testing goes. Eric From emiller at genesishosting.com Thu Aug 6 04:30:01 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 23:30:01 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481447B@gmsxchsvr01.thecreation.com> > That said there's no real alternative available at the moment. > well one alternitive to nova providing local lvm storage is to use > the cinder lvm driver but install it on all compute nodes then > use the cidner InstanceLocalityFilter to ensure the volume is alocated form > the host > the vm is on. > https://docs.openstack.org/cinder/latest/configuration/block- > storage/scheduler-filters.html#instancelocalityfilter > on drawback to this is that if the if the vm is moved i think you would need to > also migrate the cinder volume > seperatly afterwards. I wasn't aware of the InstanceLocalityFilter, so thank you for mentioning it! Eric From emiller at genesishosting.com Thu Aug 6 04:39:50 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 23:39:50 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481447C@gmsxchsvr01.thecreation.com> From: Donny Davis [mailto:donny at fortnebula.com] Sent: Wednesday, August 05, 2020 8:23 AM > If you have any other questions I am happy to help where I can - I have been working with all nvme stuff for the last couple years and have gotten something into prod for about 1 year with it (maybe a little longer).  > From what I can tell, getting max performance from nvme for an instance is a non-trivial task because it's just so much faster than the rest of the stack and careful considerations must be taken to get the most out of it.  > I am curious to see where you take this Eric Thanks for the response! We also use Ceph with NVMe SSDs, with many NVMe namespaces with one OSD per namespace, to fully utilize the SSDs. You are right - they are so fast that they are literally faster than any application can use. They are great for multi-tenant environments, though, where it's usually better to have more hardware than people can utilize. My first test is to try using the Libvirt "images_type=lvm" method to see how well it works. I will report back... Eric From emiller at genesishosting.com Thu Aug 6 04:53:49 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Wed, 5 Aug 2020 23:53:49 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481447D@gmsxchsvr01.thecreation.com> > From: Sean Mooney [mailto:smooney at redhat.com] > Sent: Wednesday, August 05, 2020 8:01 AM > yes that works well with the default flat/qcow file format > i assume there was a reason this was not the starting point. > the nova lvm backend i think does not supprot thin provisioning > so fi you did the same thing creating the volume group on the nvme deivce > you would technically get better write performance after the vm is booted > but > the vm spwan is slower since we cant take advantage of thin providioning > and > each root disk need to be copided form the cahced image. I wasn't aware that the nova LVM backend ([libvirt]/images_type = lvm) didn't support thin provisioned LV's. However, I do see that the "sparse_logical_volumes" parameter indicates it has been deprecated: https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.sparse_logical_volumes That would definitely be a downer. > so just monting the nova data directory on an nvme driver or a raid of nvme > drives > works well and is simple to do. Maybe we should consider doing this instead. I'll test with the Nova LVM backend first. > so there are trade off with both appoches. > generally i recommend using local sotrage e.g. the vm root disk or ephemeral > disk for fast scratchpad space > to work on data bug persitie all relevent data permently via cinder volumes. > that requires you to understand which block > devices a local and which are remote but it give you the best of both worlds. Our use case simply requires high-speed non-redundant storage for self-replicating applications like Couchbase, Cassandra, MongoDB, etc. or very inexpensive VMs that are backed-up often and can withstand the downtime when restoring from backup. That will be one more requirement (or rather a very nice to have), is to be able to create images (backups) of the local storage onto object storage, so hopefully "openstack server backup create" works like it does with rbd-backed Nova-managed persistent storage. I will let you know what I find out! Thanks everyone! Eric From marino.mrc at gmail.com Thu Aug 6 07:32:21 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Thu, 6 Aug 2020 09:32:21 +0200 Subject: [tripleo] Deploy overcloud without provisioning Message-ID: Hi, I'm trying to deploy an overcloud using tripleo with pre provisioned nodes. My configuration is quite simple: - 1 controller and 1 compute nodes on which I already installed CentOS 8.2 - Both nodes have a dedicated idrac interface with an ip in 192.168.199.0/24. Please note that this interface is not visible with "ip a" or "ifconfig". It's a dedicated IDRAC interface - Both nodes have a NIC configured in the subnet 192.168.199.0/24 (192.168.199.200 and 192.168.199.201) - Undercloud uses 192.168.199.0/24 as pxe/provisioning network (but I don't need provisioning) Question: should I import nodes with "openstack overcloud node import nodes.yaml" even if I don't need the provisioning step? Furthermore, on the undercloud I created one file: /home/stack/templates/node-info.yaml with the following content parameter_defaults: OvercloudControllerFlavor: control OvercloudComputeFlavor: compute ControllerCount: 1 ComputeCount: 1 Question: How can I specify that "node X with ip Y should be used as a controller and node Z with ip K should be used as a compute"?? Should I set the property with the following command? openstack baremetal node set --property capabilities='profile:control' controller1 Thank you, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Aug 6 07:46:01 2020 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Aug 2020 08:46:01 +0100 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: On Wed, 5 Aug 2020 at 16:16, Michael Johnson wrote: > Looking at that error, it appears that the lb-mgmt-net is not setup > correctly. The Octavia controller containers are not able to reach the > amphora instances on the lb-mgmt-net subnet. > > I don't know how kolla is setup to connect the containers to the neutron > lb-mgmt-net network. Maybe the above documents will help with that. > Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. > Michael > > On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: > >> >> >> On Tue, 4 Aug 2020 at 16:58, Monika Samal >> wrote: >> >>> Hello Guys, >>> >>> With Michaels help I was able to solve the problem but now there is >>> another error I was able to create my network on vlan but still error >>> persist. PFB the logs: >>> >>> http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ >>> >>> Kindly help >>> >>> regards, >>> Monika >>> ------------------------------ >>> *From:* Michael Johnson >>> *Sent:* Monday, August 3, 2020 9:10 PM >>> *To:* Fabian Zimmermann >>> *Cc:* Monika Samal ; openstack-discuss < >>> openstack-discuss at lists.openstack.org> >>> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> >>> Yeah, it looks like nova is failing to boot the instance. >>> >>> Check this setting in your octavia.conf files: >>> https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id >>> >>> Also, if kolla-ansible didn't set both of these values correctly, please >>> open bug reports for kolla-ansible. These all should have been configured >>> by the deployment tool. >>> >>> >> I wasn't following this thread due to no [kolla] tag, but here are the >> recently added docs for Octavia in kolla [1]. Note >> the octavia_service_auth_project variable which was added to migrate from >> the admin project to the service project for octavia resources. We're >> lacking proper automation for the flavor, image etc, but it is being worked >> on in Victoria [2]. >> >> [1] >> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html >> [2] https://review.opendev.org/740180 >> >> Michael >>> >>> On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann >>> wrote: >>> >>> Seems like the flavor is missing or empty '' - check for typos and >>> enable debug. >>> >>> Check if the nova req contains valid information/flavor. >>> >>> Fabian >>> >>> Monika Samal schrieb am Mo., 3. Aug. 2020, >>> 15:46: >>> >>> It's registered >>> >>> Get Outlook for Android >>> ------------------------------ >>> *From:* Fabian Zimmermann >>> *Sent:* Monday, August 3, 2020 7:08:21 PM >>> *To:* Monika Samal ; openstack-discuss < >>> openstack-discuss at lists.openstack.org> >>> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> >>> Did you check the (nova) flavor you use in octavia. >>> >>> Fabian >>> >>> Monika Samal schrieb am Mo., 3. Aug. 2020, >>> 10:53: >>> >>> After Michael suggestion I was able to create load balancer but there is >>> error in status. >>> >>> >>> >>> PFB the error link: >>> >>> http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ >>> ------------------------------ >>> *From:* Monika Samal >>> *Sent:* Monday, August 3, 2020 2:08 PM >>> *To:* Michael Johnson >>> *Cc:* Fabian Zimmermann ; Amy Marrich < >>> amy at demarco.com>; openstack-discuss < >>> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >>> community at lists.openstack.org> >>> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> >>> Thanks a ton Michael for helping me out >>> ------------------------------ >>> *From:* Michael Johnson >>> *Sent:* Friday, July 31, 2020 3:57 AM >>> *To:* Monika Samal >>> *Cc:* Fabian Zimmermann ; Amy Marrich < >>> amy at demarco.com>; openstack-discuss < >>> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >>> community at lists.openstack.org> >>> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> >>> Just to close the loop on this, the octavia.conf file had >>> "project_name = admin" instead of "project_name = service" in the >>> [service_auth] section. This was causing the keystone errors when >>> Octavia was communicating with neutron. >>> >>> I don't know if that is a bug in kolla-ansible or was just a local >>> configuration issue. >>> >>> Michael >>> >>> On Thu, Jul 30, 2020 at 1:39 PM Monika Samal >>> wrote: >>> > >>> > Hello Fabian,, >>> > >>> > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ >>> > >>> > Regards, >>> > Monika >>> > ________________________________ >>> > From: Fabian Zimmermann >>> > Sent: Friday, July 31, 2020 1:57 AM >>> > To: Monika Samal >>> > Cc: Michael Johnson ; Amy Marrich < >>> amy at demarco.com>; openstack-discuss < >>> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >>> community at lists.openstack.org> >>> > Subject: Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> > >>> > Hi, >>> > >>> > just to debug, could you replace the auth_type password with >>> v3password? >>> > >>> > And do a curl against your :5000 and :35357 urls and paste the output. >>> > >>> > Fabian >>> > >>> > Monika Samal schrieb am Do., 30. Juli >>> 2020, 22:15: >>> > >>> > Hello Fabian, >>> > >>> > http://paste.openstack.org/show/796477/ >>> > >>> > Thanks, >>> > Monika >>> > ________________________________ >>> > From: Fabian Zimmermann >>> > Sent: Friday, July 31, 2020 1:38 AM >>> > To: Monika Samal >>> > Cc: Michael Johnson ; Amy Marrich < >>> amy at demarco.com>; openstack-discuss < >>> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >>> community at lists.openstack.org> >>> > Subject: Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> > >>> > The sections should be >>> > >>> > service_auth >>> > keystone_authtoken >>> > >>> > if i read the docs correctly. Maybe you can just paste your config >>> (remove/change passwords) to paste.openstack.org and post the link? >>> > >>> > Fabian >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0ne at e0ne.info Thu Aug 6 07:46:25 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 6 Aug 2020 10:46:25 +0300 Subject: [horizon] Victoria virtual mid-cycle poll In-Reply-To: References: Message-ID: Hi everybody, According to our poll [2] we'll have a one-hour mid-cycle poll today at 13.00 UTC. I'll share a Zoom link before the meeting today. We're going to discuss current release priorities [3] and our future plans. [2] https://doodle.com/poll/dkmsai49v4zzpca2 [3] https://etherpad.opendev.org/p/horizon-release-priorities Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Thu, Jul 30, 2020 at 1:00 PM Ivan Kolodyazhny wrote: > Hi team, > > If something can go wrong, it will definitely go wrong. > It means that I did a mistake in my original mail and sent you > completely wrong dates:(. > > Horizon Virtual mid-cycle is supposed to be next week Aug 5-7. I'm > planning to have a single one-hour session. > In case, if we've got a lot of participants and topic to discuss, we can > schedule one more session a week or two weeks later. > > Here is a correct poll: https://doodle.com/poll/dkmsai49v4zzpca2 > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > > > On Wed, Jul 22, 2020 at 10:26 AM Ivan Kolodyazhny wrote: > >> Hi team, >> >> As discussed at Horizon's Virtual PTG [1], we'll have a virtual mid-cycle >> meeting around Victoria-2 milestone. >> >> We'll discuss Horizon current cycle development priorities and the future >> of Horizon with modern JS frameworks. >> >> Please indicate your availability to meet for the first session, which >> will be held during the week of July 27-31: >> >> https://doodle.com/poll/3neps94amcreaw8q >> >> Please respond before 12:00 UTC on Tuesday 4 August. >> >> [1] https://etherpad.opendev.org/p/horizon-v-ptg >> >> Regards, >> Ivan Kolodyazhny, >> http://blog.e0ne.info/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0ne at e0ne.info Thu Aug 6 07:48:39 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 6 Aug 2020 10:48:39 +0300 Subject: [horizon] patternfly? In-Reply-To: <20200628140127.GA502608@straylight.m.ringlet.net> References: <20200623005343.rkgtee524s5tl7kx@yuggoth.org> <115da5a2-0bf1-4ec0-8ba6-0b3d1f3b9ab7@debian.org> <7406ea49-37ed-da56-24c5-786c342e632e@catalyst.net.nz> <20200628140127.GA502608@straylight.m.ringlet.net> Message-ID: Hi, We can discuss Horizon v next today during our mid-cycle call: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016346.html Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Sun, Jun 28, 2020 at 5:03 PM Peter Pentchev wrote: > On Wed, Jun 24, 2020 at 02:07:06PM +1200, Adrian Turjak wrote: > > On 24/06/20 1:05 am, Thomas Goirand wrote: > > > Anyone dismissing how huge of a problem this is, isn't doing serious > > > programming, for serious production use. That person would just be > doing > > > script kiddy work in the playground. Yes, it works, yes, it's shiny and > > > all. The upstream code may even be super nice, well written and all. > But > > > it's *NOT* serious to put such JS bundling approach in production. > > And yet people are running huge projects in production like this just > > fine. So clearly people are finding sane practices around it that give > > them enough security to feel safe that don't involve packaging each npm > > requirement as an OS package. How exactly are all the huge powerhouses > > doing it then when most of the internet's front end is giant js bundles > > built from npm dependencies? How does gitlab do it for their omnibus? > > From a cursory glance it did seem like they did use npm, and had a rake > > job to compile the js. Gitlab most definitely isn't "script kiddy work". > > > > I'm mostly a python dev, so I don't deal with npm often. When it comes > > to python though, other than underlying OS packages for python/pip > > itself, I use pip for installing my versions (in a venv or container). > > I've had too many painful cases of weird OS package versions, and I > > dislike the idea of relying on the OS when there is a perfectly valid > > and working package management system for my application requirements. I > > can audit the versions installed against known CVEs, and because I > > control the requirements, I can ensure I'm never using out of date > > libraries. > > > > Javascript and npm is only different because the sheer number of > > dependencies. Which is terrifying, don't get me wrong, but you can lock > > versions, you can audit them against CVEs, you can be warned if they are > > out of date. How other than by sheer scale is it really worse than pip > > if you follow some standards and a consistent process? > > What Thomas is trying to say, and I think other people in this thread > also agreed with, is that it's not "only" because of the sheer number of > dependencies. My personal opinion is that the Javascript ecosystem is > currently where Perl/CPAN was 25 years ago, Python was between 15 and 20 > years ago, and Ruby was 10-15 years ago: quite popular, attracting many > people who "just want to write a couple of lines of code to solve this > simple task", and, as a very logical consequence, full of small > libraries that various people developed to fix their own itches and just > released out into the wild without very much thought of long-term > maintenance. Now, this has several consequences (most of them have been > pointed out already): > > - there are many (not all, but many) developers who do not even try to > keep their own libraries backwards-compatible > > - there are many (not all, but many) developers who, once they have > written a piece of code that uses three libraries from other people, > do not really bother to follow the development of those libraries and > try to make their own piece of code compatible with their new versions > (this holds even more if there are not three, but fifteen libraries > from other people; it can be a bit hard to keep up with them all if > their authors do not care about API stability) > > - there are many libraries that lock the versions of their dependencies, > thus bringing back what was once known as "DLL hell", over and over > and over again (and yes, this happens in other languages, too) > > - there are many, many, *many* libraries that solve the same problems > over and over again in subtly different ways, either because their > authors were not aware of the other implementations or because said > other implementations could not exactly scratch the author's itch and > it was easier to write their own instead of spend some more time > trying to adapt the other one and propose changes to its author > (and, yes, I myself have been guilty of this in C, Perl, and Python > projects in the past; NIH is a very, very easy slope to slide down > along) > > I *think* that, with time, many Javascript developers will realize that > this situation is unsustainable in the long term, and, one by one, they > will start doing what C/C++, Perl, Python, and Ruby people have been > doing for some time now: > > - start thinking about backwards compatibility, think really hard before > making an incompatible change and, if they really have to, use > something like semantic versioning (not necessarily exactly semver, > but something similar) to signal the API breakage > > - once the authors of the libraries they depend on start doing this, > start encoding loose version requirements (not strictly pinned), such > as "dep >= 1.2.1, dep < 3". This is already done in many Python > packages, and OpenStack's upper-constraints machinery is a wonderful > example of how this can be maintained in a conservative manner that > virtually guarantees that the end result will work. > > - start wondering whether it is really worth it to maintain their own > pet implementation instead of extending a more-widely-used one, thus > eventually having the community settle on a well-known set of > more-or-less comprehensive and very widely tested packages for most > tasks. Once this happens, the authors of these widely-used libraries > absolutely *have* to keep some degree of backwards compatibility and > some kind of reasonable versioning scheme to signal changes. > > So, I'm kind of optimistic and I believe that, with time, the Javascript > ecosystem will become better. Unfortunately, this process has taken many > years for the other languages I've mentioned, and is not really fully > complete in any of them: any module repository has its share of > mostly-maintained reimplementations of various shapes and sizes of the > wheel. So I guess the point of all this was mostly to explain the > problem (once again) more than propose any short-term solutions :/ > > G'luck, > Peter > > -- > Peter Pentchev roam at ringlet.net roam at debian.org pp at storpool.com > PGP key: http://people.FreeBSD.org/~roam/roam.key.asc > Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Thu Aug 6 07:51:53 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Thu, 6 Aug 2020 13:21:53 +0530 Subject: [tripleo] Deploy overcloud without provisioning In-Reply-To: References: Message-ID: On Thu, Aug 6, 2020 at 1:07 PM Marco Marino wrote: > Hi, > I'm trying to deploy an overcloud using tripleo with pre provisioned > nodes. My configuration is quite simple: > - 1 controller and 1 compute nodes on which I already installed CentOS 8.2 > - Both nodes have a dedicated idrac interface with an ip in > 192.168.199.0/24. Please note that this interface is not visible with "ip > a" or "ifconfig". It's a dedicated IDRAC interface > - Both nodes have a NIC configured in the subnet 192.168.199.0/24 > (192.168.199.200 and 192.168.199.201) > - Undercloud uses 192.168.199.0/24 as pxe/provisioning network (but I > don't need provisioning) > > Question: should I import nodes with "openstack overcloud node import > nodes.yaml" even if I don't need the provisioning step? > > Furthermore, on the undercloud I created one file: > /home/stack/templates/node-info.yaml with the following content > > parameter_defaults: > OvercloudControllerFlavor: control > OvercloudComputeFlavor: compute > ControllerCount: 1 > ComputeCount: 1 > > Question: How can I specify that "node X with ip Y should be used as a > controller and node Z with ip K should be used as a compute"?? > With pre-provisioned nodes (DeployedServer), you would need to specify HostnameMap and DeployedServerPortMap parameters that would map the pre-provisioned hosts and ctlplane ips. Please check documentation[1] for more details. [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployed-server-with-config-download > Should I set the property with the following command? > openstack baremetal node set --property capabilities='profile:control' > controller1 > > Thank you, > Marco > > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From emiller at genesishosting.com Thu Aug 6 07:57:54 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Aug 2020 02:57:54 -0500 Subject: [cinder][nova] Local storage in compute node References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA0481447F@gmsxchsvr01.thecreation.com> > No - a thin-provisioned LV in LVM would be best. From testing, it looks like thick-provisioned is the only choice at this stage. That's fine. > I will let everyone know how testing goes. So far, everything is working perfectly with Nova using LVM. It was a quick configuration and it did exactly what I expected, which is always nice. :) As far as performance goes, it is decent, but not stellar. Of course, I'm comparing crazy fast native NVMe storage in RAID 0 across 4 x Micron 9300 SSDs (using md as the underlying physical volume in LVM) to virtualized storage. Some numbers from fio, just to get an idea for how good/bad the IOPS will be: Configuration: 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment 32 vCPU VM with 64GiB of RAM 32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk) iodepth=10 numofjobs=32 time=30 (seconds) The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance. There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing. Bare metal (random 4KiB reads): 8066MiB/sec 154.34 microsecond avg latency 2.065 million IOPS VM qcow2 (random 4KiB reads): 589MiB/sec 2122.10 microsecond avg latency 151k IOPS Bare metal (random 4KiB writes): 4940MiB/sec 252.44 microsecond avg latency 1.265 million IOPS VM qcow2 (random 4KiB writes): 589MiB/sec 2119.16 microsecond avg latency 151k IOPS Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck. CPUs in the VM were all at 55% utilization (all kernel usage). The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization. Below are runs with sequential 1MiB block tests Bare metal (sequential 1MiB reads): 13.3GiB/sec 23446.43 microsecond avg latency 13.7k IOPS VM qcow2 (sequential 1MiB reads): 8378MiB/sec 38164.52 microsecond avg latency 8377 IOPS Bare metal (sequential 1MiB writes): 8098MiB/sec 39488.00 microsecond avg latency 8097 million IOPS VM qcow2 (sequential 1MiB writes): 8087MiB/sec 39534.96 microsecond avg latency 8087 IOPS Amazing that a VM can move 8GiB/sec to/from storage. :) However, IOPS limits are a bit disappointing when compared to bare metal (but this is relative since 151k IOPS is quite a bit!). Not sure if additional "iothreads" QEMU would help, but that is not set in the Libvirt XML file, and I don't see any way to use Nova to set it. The Libvirt XML for the disk appears as:
Any suggestions for improvement? I "think" that the "images_type = flat" option in nova.conf indicates that images are stored in the /var/lib/nova/instances/* directories? If so, that might be an option, but since we're using Kolla, that directory (or rather /var/lib/nova) is currently a docker volume. So, it might be necessary to mount the NVMe storage at its respective /var/lib/docker/volumes/nova_compute/_data/instances directory. Not sure if the "flat" option will be any faster, especially since Docker would be another layer to go through. Any opinions? Thanks! Eric From marino.mrc at gmail.com Thu Aug 6 08:05:21 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Thu, 6 Aug 2020 10:05:21 +0200 Subject: [tripleo] Deploy overcloud without provisioning In-Reply-To: References: Message-ID: Thank you Rabi, but node import is mandatory or not? Bare Metal part is useful for power management and I'd like to maintain this feature. Marco Il giorno gio 6 ago 2020 alle ore 09:52 Rabi Mishra ha scritto: > > > On Thu, Aug 6, 2020 at 1:07 PM Marco Marino wrote: > >> Hi, >> I'm trying to deploy an overcloud using tripleo with pre provisioned >> nodes. My configuration is quite simple: >> - 1 controller and 1 compute nodes on which I already installed CentOS 8.2 >> - Both nodes have a dedicated idrac interface with an ip in >> 192.168.199.0/24. Please note that this interface is not visible with >> "ip a" or "ifconfig". It's a dedicated IDRAC interface >> - Both nodes have a NIC configured in the subnet 192.168.199.0/24 >> (192.168.199.200 and 192.168.199.201) >> - Undercloud uses 192.168.199.0/24 as pxe/provisioning network (but I >> don't need provisioning) >> >> Question: should I import nodes with "openstack overcloud node import >> nodes.yaml" even if I don't need the provisioning step? >> >> Furthermore, on the undercloud I created one file: >> /home/stack/templates/node-info.yaml with the following content >> >> parameter_defaults: >> OvercloudControllerFlavor: control >> OvercloudComputeFlavor: compute >> ControllerCount: 1 >> ComputeCount: 1 >> >> > Question: How can I specify that "node X with ip Y should be used as a >> controller and node Z with ip K should be used as a compute"?? >> > > With pre-provisioned nodes (DeployedServer), you would need to specify > HostnameMap and DeployedServerPortMap parameters that would map the > pre-provisioned hosts and ctlplane ips. > > Please check documentation[1] for more details. > > [1] > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployed-server-with-config-download > > >> Should I set the property with the following command? >> openstack baremetal node set --property capabilities='profile:control' >> controller1 >> >> Thank you, >> Marco >> >> > > -- > Regards, > Rabi Mishra > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Thu Aug 6 08:23:32 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Thu, 6 Aug 2020 13:53:32 +0530 Subject: [tripleo] Deploy overcloud without provisioning In-Reply-To: References: Message-ID: On Thu, Aug 6, 2020 at 1:35 PM Marco Marino wrote: > Thank you Rabi, > but node import is mandatory or not? Bare Metal part is useful for power > management and I'd like to maintain this feature. > > AFAIK, the intent of using pre-provisioned nodes is to create an overcloud without power management control, among other things. Node import is not required when Tripleo(Ironic) is not doing the provisioning. I don't know if there are ways for Ironic to do power management of pre-provisioned nodes. Someone else may have a better answer. > Marco > > Il giorno gio 6 ago 2020 alle ore 09:52 Rabi Mishra > ha scritto: > >> >> >> On Thu, Aug 6, 2020 at 1:07 PM Marco Marino wrote: >> >>> Hi, >>> I'm trying to deploy an overcloud using tripleo with pre provisioned >>> nodes. My configuration is quite simple: >>> - 1 controller and 1 compute nodes on which I already installed CentOS >>> 8.2 >>> - Both nodes have a dedicated idrac interface with an ip in >>> 192.168.199.0/24. Please note that this interface is not visible with >>> "ip a" or "ifconfig". It's a dedicated IDRAC interface >>> - Both nodes have a NIC configured in the subnet 192.168.199.0/24 >>> (192.168.199.200 and 192.168.199.201) >>> - Undercloud uses 192.168.199.0/24 as pxe/provisioning network (but I >>> don't need provisioning) >>> >>> Question: should I import nodes with "openstack overcloud node import >>> nodes.yaml" even if I don't need the provisioning step? >>> >>> Furthermore, on the undercloud I created one file: >>> /home/stack/templates/node-info.yaml with the following content >>> >>> parameter_defaults: >>> OvercloudControllerFlavor: control >>> OvercloudComputeFlavor: compute >>> ControllerCount: 1 >>> ComputeCount: 1 >>> >>> >> Question: How can I specify that "node X with ip Y should be used as a >>> controller and node Z with ip K should be used as a compute"?? >>> >> >> With pre-provisioned nodes (DeployedServer), you would need to specify >> HostnameMap and DeployedServerPortMap parameters that would map the >> pre-provisioned hosts and ctlplane ips. >> >> Please check documentation[1] for more details. >> >> [1] >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployed-server-with-config-download >> >> >>> Should I set the property with the following command? >>> openstack baremetal node set --property capabilities='profile:control' >>> controller1 >>> >>> Thank you, >>> Marco >>> >>> >> >> -- >> Regards, >> Rabi Mishra >> >> -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Thu Aug 6 10:02:26 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 6 Aug 2020 12:02:26 +0200 Subject: [ops] [keystone] "Roles are not immutable" Message-ID: After updating to Train from rocky (on stein we just performed the db-sync), we tried the new "keystone-status update check" command which says that the admin role is not immutable [*]. As far as I understand this is something that was done to prevent deleting/modifying the default roles (that could cause major problems). But how am I supposed to fix this? The "--immutable" option for the "openstack role set" command, documented at: https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/role-v3.html is not available in Train. Thanks, Massimo [*] +-------------------------------------------+ | Check: Check default roles are immutable | | Result: Failure | | Details: Roles are not immutable: admin | +-------------------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Thu Aug 6 10:13:36 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Thu, 6 Aug 2020 12:13:36 +0200 Subject: [ironic] [infra] bifrost-integration-tinyipa-opensuse-15 broken Message-ID: Hi folks, Our openSUSE CI job has been broken for a few days [1]. It fails on the early bindep stage with [2] File './suse/x86_64/libJudy1-1.0.5-lp151.2.2.x86_64.rpm' not found on medium ' https://mirror.mtl01.inap.opendev.org/opensuse/distribution/leap/15.1/repo/oss/&apos ; I've raised it on #openstack-infra, but I'm not sure if there has been any follow up. Help is appreciated Dmitry [1] https://zuul.openstack.org/builds?job_name=bifrost-integration-tinyipa-opensuse-15 [2] https://zuul.openstack.org/build/f4c7d174d171482394d1d0754c863ae1/console -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 6 10:15:26 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 06 Aug 2020 11:15:26 +0100 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA0481447D@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA0481447D@gmsxchsvr01.thecreation.com> Message-ID: <5136cf19c506c5fa8b0293a0b5f4f15cb714ce3b.camel@redhat.com> On Wed, 2020-08-05 at 23:53 -0500, Eric K. Miller wrote: > > From: Sean Mooney [mailto:smooney at redhat.com] > > Sent: Wednesday, August 05, 2020 8:01 AM > > yes that works well with the default flat/qcow file format > > i assume there was a reason this was not the starting point. > > the nova lvm backend i think does not supprot thin provisioning > > so fi you did the same thing creating the volume group on the nvme deivce > > you would technically get better write performance after the vm is booted > > but > > the vm spwan is slower since we cant take advantage of thin providioning > > and > > each root disk need to be copided form the cahced image. > > I wasn't aware that the nova LVM backend ([libvirt]/images_type = lvm) didn't support thin provisioned LV's. However, > I do see that the "sparse_logical_volumes" parameter indicates it has been deprecated: > https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.sparse_logical_volumes > > That would definitely be a downer. > > > so just monting the nova data directory on an nvme driver or a raid of nvme > > drives > > works well and is simple to do. > > Maybe we should consider doing this instead. I'll test with the Nova LVM backend first. > > > so there are trade off with both appoches. > > generally i recommend using local sotrage e.g. the vm root disk or ephemeral > > disk for fast scratchpad space > > to work on data bug persitie all relevent data permently via cinder volumes. > > that requires you to understand which block > > devices a local and which are remote but it give you the best of both worlds. > > Our use case simply requires high-speed non-redundant storage for self-replicating applications like Couchbase, > Cassandra, MongoDB, etc. or very inexpensive VMs that are backed-up often and can withstand the downtime when > restoring from backup. > > That will be one more requirement (or rather a very nice to have), is to be able to create images (backups) of the > local storage onto object storage, so hopefully "openstack server backup create" works like it does with rbd-backed > Nova-managed persistent storage. it wil snapshot the root disk if you use addtional ephmeeral disks i do not think they are included but if you create the vms wit a singel root disk that is big enaough for your needs and use swift as your glance backend then yes. it will store the backups in object storage and rotate up to N backups per instance. > > I will let you know what I find out! > > Thanks everyone! > > Eric From emiller at genesishosting.com Thu Aug 6 10:26:59 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Thu, 6 Aug 2020 05:26:59 -0500 Subject: [cinder][nova] Local storage in compute node In-Reply-To: <5136cf19c506c5fa8b0293a0b5f4f15cb714ce3b.camel@redhat.com> References: <046E9C0290DD9149B106B72FC9156BEA04814477@gmsxchsvr01.thecreation.com> <046E9C0290DD9149B106B72FC9156BEA04814478@gmsxchsvr01.thecreation.com> <20200805111934.77lesgmmdiqeo27m@lyarwood.usersys.redhat.com> <7b7f6e277f77423ae6502d81c6d778fd4249b99d.camel@redhat.com> <92839697a08966dc17cd5c4c181bb32e2d197f93.camel@redhat.com> <4f025d444406898903dabf3049ed021822cce19b.camel@redhat.com> <046E9C0290DD9149B106B72FC9156BEA0481447D@gmsxchsvr01.thecreation.com> <5136cf19c506c5fa8b0293a0b5f4f15cb714ce3b.camel@redhat.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814483@gmsxchsvr01.thecreation.com> > it wil snapshot the root disk > if you use addtional ephmeeral disks i do not think they are included > but if you create the vms wit a singel root disk that is big enaough for your > needs and use swift as your glance backend > then yes. it will store the backups in object storage and rotate up to N > backups per instance. Thanks Sean! I tested a VM with a single root disk (no ephemeral disks) and it worked as expected (how you described). From e0ne at e0ne.info Thu Aug 6 12:03:53 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 6 Aug 2020 15:03:53 +0300 Subject: [horizon] Victoria virtual mid-cycle poll In-Reply-To: References: Message-ID: Here is Zoom connection details: Topic: Horizon Virtual Mid-Cycle Time: Aug 6, 2020 01:00 PM Universal Time UTC Join Zoom Meeting https://zoom.us/j/94173501669?pwd=c3JuNnpJMnBvNzgzdVJ5NDRhMnlhQT09 Meeting ID: 941 7350 1669 Passcode: 710495 One tap mobile +16468769923,,94173501669#,,,,,,0#,,710495# US (New York) +16699006833,,94173501669#,,,,,,0#,,710495# US (San Jose) Dial by your location +1 646 876 9923 US (New York) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 301 715 8592 US (Germantown) +1 312 626 6799 US (Chicago) +1 346 248 7799 US (Houston) +1 408 638 0968 US (San Jose) Meeting ID: 941 7350 1669 Passcode: 710495 Find your local number: https://zoom.us/u/ah3SiLk1q Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Thu, Aug 6, 2020 at 10:46 AM Ivan Kolodyazhny wrote: > Hi everybody, > > According to our poll [2] we'll have a one-hour mid-cycle poll today at > 13.00 UTC. I'll share a Zoom link before the meeting today. > > We're going to discuss current release priorities [3] and our future plans. > > > [2] https://doodle.com/poll/dkmsai49v4zzpca2 > [3] https://etherpad.opendev.org/p/horizon-release-priorities > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > > > On Thu, Jul 30, 2020 at 1:00 PM Ivan Kolodyazhny wrote: > >> Hi team, >> >> If something can go wrong, it will definitely go wrong. >> It means that I did a mistake in my original mail and sent you >> completely wrong dates:(. >> >> Horizon Virtual mid-cycle is supposed to be next week Aug 5-7. I'm >> planning to have a single one-hour session. >> In case, if we've got a lot of participants and topic to discuss, we can >> schedule one more session a week or two weeks later. >> >> Here is a correct poll: https://doodle.com/poll/dkmsai49v4zzpca2 >> >> Regards, >> Ivan Kolodyazhny, >> http://blog.e0ne.info/ >> >> >> On Wed, Jul 22, 2020 at 10:26 AM Ivan Kolodyazhny wrote: >> >>> Hi team, >>> >>> As discussed at Horizon's Virtual PTG [1], we'll have a virtual >>> mid-cycle meeting around Victoria-2 milestone. >>> >>> We'll discuss Horizon current cycle development priorities and the >>> future of Horizon with modern JS frameworks. >>> >>> Please indicate your availability to meet for the first session, which >>> will be held during the week of July 27-31: >>> >>> https://doodle.com/poll/3neps94amcreaw8q >>> >>> Please respond before 12:00 UTC on Tuesday 4 August. >>> >>> [1] https://etherpad.opendev.org/p/horizon-v-ptg >>> >>> Regards, >>> Ivan Kolodyazhny, >>> http://blog.e0ne.info/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Thu Aug 6 14:04:21 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 6 Aug 2020 14:04:21 +0000 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <88c24f3a-7d29-aa39-ed12-803279cc90c1@openstack.org> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <88c24f3a-7d29-aa39-ed12-803279cc90c1@openstack.org> Message-ID: <20200806140421.GN31915@sync> Hey all, Thanks for your replies. About the fact that nova already implement this, I will try again on my side, but maybe it was not yet implemented in newton (I only tried nova on newton version). Thank you for bringing that to me. About the healhcheck already done on nova side (and also on neutron). As far as I understand, it's done using a specific rabbit queue, which can work while others queues are not working. The purpose of adding ping endpoint here is to be able to ping in all topics, not only those used for healthcheck reports. Also, as mentionned by Thierry, what we need is a way to externally do pings toward neutron agents and nova computes. The patch itself is not going to add any load on rabbit. It really depends on the way the operator will use it. On my side, I built a small external oslo.messaging script which I can use to do such pings. Cheers, -- Arnaud Morin On 03.08.20 - 12:15, Thierry Carrez wrote: > Ken Giusti wrote: > > On Mon, Jul 27, 2020 at 1:18 PM Dan Smith > > wrote: > > > The primary concern was about something other than nova sitting on our > > > bus making calls to our internal services. I imagine that the proposal > > > to bake it into oslo.messaging is for the same purpose, and I'd probably > > > have the same concern. At the time I think we agreed that if we were > > > going to support direct-to-service health checks, they should be teensy > > > HTTP servers with oslo healthchecks middleware. Further loading down > > > rabbit with those pings doesn't seem like the best plan to > > > me. Especially since Nova (compute) services already check in over RPC > > > periodically and the success of that is discoverable en masse through > > > the API. > > > > While initially in favor of this feature Dan's concern has me > > reconsidering this. > > > > Now I believe that if the purpose of this feature is to check the > > operational health of a service _using_ oslo.messaging, then I'm against > > it.   A naked ping to a generic service point in an application doesn't > > prove the operating health of that application beyond its connection to > > rabbit. > > While I understand the need to further avoid loading down Rabbit, I like the > universality of this solution, solving a real operational issue. > > Obviously that creates a trade-off (further loading rabbit to get more > operational insights), but nobody forces you to run those ping calls, they > would be opt-in. So the proposed code in itself does not weigh down Rabbit, > or make anything sit on the bus. > > > Connectivity monitoring between an application and rabbit is done using > > the keepalive connection heartbeat mechanism built into the rabbit > > protocol, which O.M. supports today. > > I'll let Arnaud answer, but I suspect the operational need is code-external > checking of the rabbit->agent chain, not code-internal checking of the > agent->rabbit chain. The heartbeat mechanism is used by the agent to keep > the Rabbit connection alive, ensuring it works in most of the cases. The > check described above is to catch the corner cases where it still doesn't. > > -- > Thierry Carrez (ttx) > From marino.mrc at gmail.com Thu Aug 6 14:06:08 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Thu, 6 Aug 2020 16:06:08 +0200 Subject: [tripleo] Overcloud without provisioning error: os-net-config command not found Message-ID: Hi, I'm trying to deploy an overcloud using pre-provisioned nodes. I have only 2 nodes, 1 compute and 1 controller and here is the command I'm using: openstack overcloud deploy --templates --disable-validations -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml -e /home/stack/templates/node-info.yaml -e /home/stack/templates/ctlplane-assignments.yaml -e /home/stack/templates/hostname-map.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -n /home/stack/os-deploy-custom-config/network_data.yaml --overcloud-ssh-user stack --overcloud-ssh-key /home/stack/.ssh/id_rsa Here is the content of custom files: (undercloud) [stack at undercloud ~]$ cat templates/node-info.yaml parameter_defaults: OvercloudControllerFlavor: control OvercloudComputeFlavor: compute ControllerCount: 1 ComputeCount: 1 (undercloud) [stack at undercloud ~]$ cat templates/ctlplane-assignments.yaml resource_registry: OS::TripleO::DeployedServer::ControlPlanePort: /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-neutron-port.yaml parameter_defaults: DeployedServerPortMap: controller-0-ctlplane: fixed_ips: - ip_address: 192.168.199.200 subnets: - cidr: 192.168.199.0/24 network: tags: 192.168.199.0/24 compute-0-ctlplane: fixed_ips: - ip_address: 192.168.199.210 subnets: - cidr: 192.168.199.0/24 network: tags: 192.168.199.0/24 (undercloud) [stack at undercloud ~]$ cat templates/hostname-map.yaml parameter_defaults: HostnameMap: overcloud-controller-0: controller-0 overcloud-novacompute-0: compute-0 http://paste.openstack.org/show/796634/ <-- Here is the complete output for overcloud deploy command. It seems that the error is /var/lib/tripleo-config/scripts/run_os_net_config.sh: line 59: os-net-config: command not found" os-net-config is provided by "delorean-component-tripleo" repository. So my question is: should I pre install Openstack repositories on pre-provisioned nodes in addition to operating system installation and network configuration? Thank you, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Thu Aug 6 14:11:32 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 6 Aug 2020 14:11:32 +0000 Subject: [largescale-sig] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> Message-ID: <20200806141132.GO31915@sync> Hi Mohammed, 1 - That's something we would also like, but it's beyond the patch I propose. I need this patch not only for kubernetes, but also for monitoring my legagy openstack agents running outside of k8s. 2 - Yes, latest version of rabbitmq is better on that point, but we still see some weird issue (I will ask the community about it in another topic). 3 - Thanks for this operator, we'll take a look! By saying 1 rabbit per service, I understand 1 server, not 1 cluster, right? That sounds risky if you lose the server. I suppose you dont do that for the database? 4 - Nice, how to you monitor those consumptions? Using rabbit management API? Cheers, -- Arnaud Morin On 03.08.20 - 10:21, Mohammed Naser wrote: > I have a few operational suggestions on how I think we could do this best: > > 1. I think exposing a healthcheck endpoint that _actually_ runs the > ping and responds with a 200 OK makes a lot more sense in terms of > being able to run it inside something like Kubernetes, you end up with > a "who makes the ping and who responds to it" type of scenario which > can be tricky though I'm sure we can figure that out > 2. I've found that newer releases of RabbitMQ really help with those > un-usable queues after a split, I haven't had any issues at all with > newer releases, so that could be something to help your life be a lot > easier. > 3. You mentioned you're moving towards Kubernetes, we're doing the > same and building an operator: > https://opendev.org/vexxhost/openstack-operator -- Because the > operator manages the whole thing and Kubernetes does it's thing too, > we started moving towards 1 (single) rabbitmq per service, which > reaaaaaaally helped a lot in stabilizing things. Oslo messaging is a > lot better at recovering when a single service IP is pointing towards > it because it doesn't do weird things like have threads trying to > connect to other Rabbit ports. Just a thought. > 4. In terms of telemetry and making sure you avoid that issue, we > track the consumption rates of queues inside OpenStack. OpenStack > consumption rate should be constant and never growing, anytime it > grows, we instantly detect that something is fishy. However, the > other issue comes in that when you restart any openstack service, it > 'forgets' all it's existing queues and then you have a set of building > up queues until they automatically expire which happens around 30 > minutes-ish, so it makes that alarm of "things are not being consumed" > a little noisy if you're restarting services > > Sorry for the wall of super unorganized text, all over the place here > but thought I'd chime in with my 2 cents :) > > On Mon, Jul 27, 2020 at 6:04 AM Arnaud Morin wrote: > > > > Hey all, > > > > TLDR: I propose a change to oslo_messaging to allow doing a ping over RPC, > > this is useful to monitor liveness of agents. > > > > > > Few weeks ago, I proposed a patch to oslo_messaging [1], which is adding a > > ping endpoint to RPC dispatcher. > > It means that every openstack service which is using oslo_messaging RPC > > endpoints (almosts all OpenStack services and agents - e.g. neutron > > server + agents, nova + computes, etc.) will then be able to answer to a > > specific "ping" call over RPC. > > > > I decided to propose this patch in my company mainly for 2 reasons: > > 1 - we are struggling monitoring our nova compute and neutron agents in a > > correct way: > > > > 1.1 - sometimes our agents are disconnected from RPC, but the python process > > is still running. > > 1.2 - sometimes the agent is still connected, but the queue / binding on > > rabbit cluster is not working anymore (after a rabbit split for > > example). This one is very hard to debug, because the agent is still > > reporting health correctly on neutron server, but it's not able to > > receive messages anymore. > > > > > > 2 - we are trying to monitor agents running in k8s pods: > > when running a python agent (neutron l3-agent for example) in a k8s pod, we > > wanted to find a way to monitor if it is still live of not. > > > > > > Adding a RPC ping endpoint could help us solve both these issues. > > Note that we still need an external mechanism (out of OpenStack) to do this > > ping. > > We also think it could be nice for other OpenStackers, and especially > > large scale ops. > > > > Feel free to comment. > > > > > > [1] https://review.opendev.org/#/c/735385/ > > > > > > -- > > Arnaud Morin > > > > > > > -- > Mohammed Naser > VEXXHOST, Inc. From sgolovat at redhat.com Thu Aug 6 14:16:56 2020 From: sgolovat at redhat.com (Sergii Golovatiuk) Date: Thu, 6 Aug 2020 16:16:56 +0200 Subject: [tripleo][ci] Make tripleo-ci-centos-8-containerized-undercloud-upgrades voting again Message-ID: Hi, tripleo-ci-centos-8-containerized-undercloud-upgrades has been improved significantly in terms of stability [1]. To improve CI coverage for upgrades I propose to make it voting. That will help to make upgrades more stable and catch bugs as early as possible. To keep it stable, Upgrade team is going to add it to their own triage process and dedicate the engineer to fix it if it's red for 2-3 days in a row. [1] https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-containerized-undercloud-upgrades&project=openstack/tripleo-common -- Sergii Golovatiuk Senior Software Developer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Thu Aug 6 14:31:24 2020 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 6 Aug 2020 08:31:24 -0600 Subject: [tripleo] Overcloud without provisioning error: os-net-config command not found In-Reply-To: References: Message-ID: On Thu, Aug 6, 2020 at 8:13 AM Marco Marino wrote: > > Hi, I'm trying to deploy an overcloud using pre-provisioned nodes. I have only 2 nodes, 1 compute and 1 controller and here is the command I'm using: > > > openstack overcloud deploy --templates --disable-validations -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml -e /home/stack/templates/node-info.yaml -e /home/stack/templates/ctlplane-assignments.yaml -e /home/stack/templates/hostname-map.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -n /home/stack/os-deploy-custom-config/network_data.yaml --overcloud-ssh-user stack --overcloud-ssh-key /home/stack/.ssh/id_rsa > > Here is the content of custom files: > > (undercloud) [stack at undercloud ~]$ cat templates/node-info.yaml > parameter_defaults: > OvercloudControllerFlavor: control > OvercloudComputeFlavor: compute > ControllerCount: 1 > ComputeCount: 1 > > (undercloud) [stack at undercloud ~]$ cat templates/ctlplane-assignments.yaml > resource_registry: > OS::TripleO::DeployedServer::ControlPlanePort: /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-neutron-port.yaml > > parameter_defaults: > DeployedServerPortMap: > controller-0-ctlplane: > fixed_ips: > - ip_address: 192.168.199.200 > subnets: > - cidr: 192.168.199.0/24 > network: > tags: > 192.168.199.0/24 > compute-0-ctlplane: > fixed_ips: > - ip_address: 192.168.199.210 > subnets: > - cidr: 192.168.199.0/24 > network: > tags: > 192.168.199.0/24 > > > (undercloud) [stack at undercloud ~]$ cat templates/hostname-map.yaml > parameter_defaults: > HostnameMap: > overcloud-controller-0: controller-0 > overcloud-novacompute-0: compute-0 > > > http://paste.openstack.org/show/796634/ <-- Here is the complete output for overcloud deploy command. > It seems that the error is > /var/lib/tripleo-config/scripts/run_os_net_config.sh: line 59: os-net-config: command not found" > > os-net-config is provided by "delorean-component-tripleo" repository. So my question is: should I pre install Openstack repositories on pre-provisioned nodes in addition to operating system installation and network configuration? > Yes per the documentation: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#package-repositories > Thank you, > Marco > > > > From arnaud.morin at gmail.com Thu Aug 6 14:40:16 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 6 Aug 2020 14:40:16 +0000 Subject: [nova][neutron][oslo][ops] rabbit bindings issue Message-ID: <20200806144016.GP31915@sync> Hey all, I would like to ask the community about a rabbit issue we have from time to time. In our current architecture, we have a cluster of rabbits (3 nodes) for all our OpenStack services (mostly nova and neutron). When one node of this cluster is down, the cluster continue working (we use pause_minority strategy). But, sometimes, the third server is not able to recover automatically and need a manual intervention. After this intervention, we restart the rabbitmq-server process, which is then able to join the cluster back. At this time, the cluster looks ok, everything is fine. BUT, nothing works. Neutron and nova agents are not able to report back to servers. They appear dead. Servers seems not being able to consume messages. The exchanges, queues, bindings seems good in rabbit. What we see is that removing bindings (using rabbitmqadmin delete binding or the web interface) and recreate them again (using the same routing key) brings the service back up and running. Doing this for all queues is really painful. Our next plan is to automate it, but is there anyone in the community already saw this kind of issues? Our bug looks like the one described in [1]. Someone recommands to create an Alternate Exchange. Is there anyone already tried that? FYI, we are running rabbit 3.8.2 (with OpenStack Stein). We had the same kind of issues using older version of rabbit. Thanks for your help. [1] https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk -- Arnaud Morin From bdobreli at redhat.com Thu Aug 6 15:02:34 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Thu, 6 Aug 2020 17:02:34 +0200 Subject: [tripleo][ci] Make tripleo-ci-centos-8-containerized-undercloud-upgrades voting again In-Reply-To: References: Message-ID: +1 On 8/6/20 4:16 PM, Sergii Golovatiuk wrote: > Hi, > > tripleo-ci-centos-8-containerized-undercloud-upgrades has been improved > significantly in terms of stability [1]. To improve CI coverage for > upgrades I propose to make it voting. That will help to make upgrades > more stable and catch bugs as early as possible. To keep it stable, > Upgrade team is going to add it to their own triage process and > dedicate the engineer to fix it if it's red for 2-3 days in a row. > > [1] > https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-containerized-undercloud-upgrades&project=openstack/tripleo-common > > -- > SergiiGolovatiuk > > Senior Software Developer > > Red Hat > > > -- Best regards, Bogdan Dobrelya, Irc #bogdando From marios at redhat.com Thu Aug 6 15:51:07 2020 From: marios at redhat.com (Marios Andreou) Date: Thu, 6 Aug 2020 18:51:07 +0300 Subject: [tripleo][ci] Make tripleo-ci-centos-8-containerized-undercloud-upgrades voting again In-Reply-To: References: Message-ID: On Thu, Aug 6, 2020 at 5:19 PM Sergii Golovatiuk wrote: > Hi, > > tripleo-ci-centos-8-containerized-undercloud-upgrades has been improved > significantly in terms of stability [1]. To improve CI coverage for > upgrades I propose to make it voting. That will help to make upgrades more > stable and catch bugs as early as possible. To keep it stable, Upgrade team > is going to add it to their own triage process and dedicate the engineer to > fix it if it's red for 2-3 days in a row. > > o/ as discussed on irc, IMO we should make it voting "until we can't". Your triage process about 2/3 days sounds reasonable but it will be seen in practice how well that works. Which is the original reason there is push-back against master voting upgrades jobs - i.e. whilst developing for the cycle the upgrades jobs might break until all new features are merged and you can accommodate for them in the upgrade. So they break and this blocks master gates. They were made non voting during a time that they were broken often and for long periods. let's make them voting "until we can't". > [1] > https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-containerized-undercloud-upgrades&project=openstack/tripleo-common > > -- > Sergii Golovatiuk > > Senior Software Developer > > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zaitcev at redhat.com Thu Aug 6 16:41:25 2020 From: zaitcev at redhat.com (Pete Zaitcev) Date: Thu, 6 Aug 2020 11:41:25 -0500 Subject: [TripleO] "bundle install" in puppet-tripleo Message-ID: <20200806114125.0f0961a9@suzdal.zaitcev.lan> Hello: Due to some circumstances, I started looking at running unit tests in puppet-tripleo. The official document[1] tells me to start by running "bundle install". This results in: [zaitcev at suzdal puppet-tripleo-c744015]$ bundle install Fetching https://git.openstack.org/openstack/puppet-openstack_spec_helper Fetching gem metadata from https://rubygems.org/........ Resolving dependencies........ Fetching rake 13.0.1 ....... Installing netaddr 1.5.1 Using pathspec 0.2.1 <--------------------- in black, not green Fetching pry 0.12.2 ....... Installing webmock 3.8.3 Using puppet-openstack_spec_helper 17.0.0 from https://git.openstack.org/openstack/puppet-openstack_spec_helper (at master at 273d24f) Updating files in vendor/cache Could not find pathspec-0.2.1.gem for installation [zaitcev at suzdal puppet-tripleo-c744015]$ Anyone got an idea what the above means? -- Pete [1] https://docs.openstack.org/puppet-openstack-guide/latest/contributor/testing.html From caifti at gmail.com Thu Aug 6 07:00:44 2020 From: caifti at gmail.com (Doina Cristina Duma) Date: Thu, 6 Aug 2020 09:00:44 +0200 Subject: [TC] [PTG] Victoria vPTG Summary of Conversations and Action Items In-Reply-To: References: Message-ID: Hello everyone, On Tue, Aug 4, 2020 at 2:14 PM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi everyone, > the problem described in the "OpenStack User-facing APIs" is something > that we face daily in our deployment. Different CLIs for different > operations. > same for us, really frustrating, going around and see what is missing (what options) > I'm really interested in driving this action item. > I totally support your proposal! Cristina > > Belmiro > > On Fri, Jun 12, 2020 at 9:38 PM Kendall Nelson > wrote: > >> Hello Everyone! >> >> I hope you all had a productive and enjoyable PTG! While it’s still >> reasonably fresh, I wanted to take a moment to summarize discussions and >> actions that came out of TC discussions. >> >> If there is a particular action item you are interested in taking, please >> reply on this thread! >> >> For the long version, check out the etherpad from the PTG[1]. >> >> Tuesday >> >> ====== >> >> Ussuri Retrospective >> >> ---------------------------- >> >> As usual we accomplished a lot. Some of the things we accomplished were >> around enumerating operating systems per release (again), removing python2 >> support, and adding the ideas repository. Towards the end of the release, >> we had a lot of discussions around what to do with leaderless projects, the >> role of PTLs, and what to do with projects that were missing PTL candidates >> for the next release. We discussed office hours, their history and reason >> for existence, and clarified how we can strengthen communication amongst >> ourselves, the projects, and the larger community. >> >> TC Onboarding >> >> -------------------- >> >> It was brought up that those elected most recently (and even new members >> the election before) felt like there wasn’t enough onboarding into the TC. >> Through discussion about what we can do to better support returning members >> is to better document the daily, weekly and monthly tasks TC members are >> supposed to be doing. Kendall Nelson proposed a patch to start adding more >> detail to a guide for TC members already[2]. It was also proposed that we >> have a sort of mentorship or shadow program for people interested in >> joining the TC or new TC members by more experienced TC members. The >> discussion about the shadow/mentorship program is to be continued. >> >> TC/UC Merge >> >> ------------------ >> >> Thierry gave an update on the merge of the committees. The simplified >> version is that the current proposal is that UC members are picked from TC >> members, the UC operates within the TC, and that we are already setup for >> this given the number of TC members that have AUC status. None of this >> requires a by-laws change. One next step that has already begun is the >> merging of the openstack-users ML into openstack-discuss ML. Other next >> steps are to decide when to do the actual transition (disbanding the >> separate UC, probably at the next election?) and when to setup AUC’s to be >> defined as extra-ATC’s to be included in the electorate for elections. For >> more detail, check out the openstack-discuss ML thread[3]. >> >> Wednesday >> >> ========= >> >> Help Wanted List >> >> ----------------------- >> >> We settled on a format for the job postings and have several on the list. >> We talked about how often we want to look through, update or add to it. The >> proposal is to do this yearly. We need to continue pushing on the board to >> dedicate contributors at their companies to work on these items, and get >> them to understand that it's an investment that will take longer than a >> year in a lot of cases; interns are great, but not enough. >> >> TC Position on Foundation Member Community Contributions >> >> >> ---------------------------------------------------------------------------------- >> >> The discussion started with a state of things today - the expectations of >> platinum members, the benefits the members get being on the board and why >> they should donate contributor resources for these benefits, etc. A variety >> of proposals were made: either enforce or remove the minimum contribution >> level, give gold members the chance to have increased visibility (perhaps >> giving them some of the platinum member advantages) if they supplement >> their monetary contributions with contributor contributions, etc. The >> #ACTION that was decided was for Mohammed to take these ideas to the board >> and see what they think. >> >> OpenStack User-facing APIs >> >> -------------------------------------- >> >> Users are confused about the state of the user facing API’s; they’ve been >> told to use the OpenStackClient(OSC) but upon use, they discover that there >> are features missing that exist in the python-*clients. Partial >> implementation in the OSC is worse than if the service only used their >> specific CLI. Members of the OpenStackSDK joined discussions and explained >> that many of the barriers that projects used to have behind implementing >> certain commands have been resolved. The proposal is to create a pop up >> team and that they start with fully migrating Nova, documenting the process >> and collecting any other unresolved blocking issues with the hope that one >> day we can set the migration of the remaining projects as a community goal. >> Supplementally, a new idea was proposed- enforcing new functionality to >> services is only added to the SDK (and optionally the OSC) and not the >> project’s specific CLI to stop increasing the disparity between the two. >> The #ACTION here is to start the pop up team, if you are interested, please >> reply! Additionally, if you disagree with this kind of enforcement, please >> contact the TC as soon as possible and explain your concerns. >> >> PTL Role in OpenStack today & Leaderless Projects >> >> --------------------------------------------------------------------- >> >> This was a veeeeeeeerrrry long conversation that went in circles a few >> times. The very short version is that we, the TC, are willing to let >> project teams decide for themselves if they want to have a more >> deconstructed kind of PTL role by breaking it into someone responsible for >> releases and someone responsible for security issues. This new format also >> comes with setting the expectation that for things like project updates and >> signing up for PTG time, if someone on the team doesn’t actively take that >> on, the default assumption is that the project won’t participate. The >> #ACTION we need someone to take on is to write a resolution about how this >> will work and how it can be done. Ideally, this would be done before the >> next technical election, so that teams can choose it at that point. If you >> are interested in taking on the writing of this resolution, please speak up! >> >> Cross Project Work >> >> ------------------------- >> >> -Pop Up Teams- >> >> The two teams we have right now are Encryption and Secure Consistent >> Policy Groups. Both are making slow progress and will continue. >> >> >> >> -Reducing Community Goals Per Cycle- >> >> Historically we have had two goals per cycle, but for smaller teams this >> can be a HUGE lift. The #ACTION is to clearly outline the documentation for >> the goal proposal and selection process to clarify that selecting only one >> goal is fine. No one has claimed this action item yet. >> >> -Victoria Goal Finalization- >> >> Currently, we have three proposals and one accepted goal. If we are going >> to select a second goal, it needs to be done ASAP as Victoria development >> has already begun. All TC members should review the last proposal >> requesting selection[4]. >> >> -Wallaby Cycle Goal Discussion Kick Off- >> >> Firstly, there is a #ACTION that one or two TC members are needed to >> guide the W goal selection. If you are interested, please reply to this >> thread! There were a few proposed goals for VIctoria that didn’t make it >> that could be the starting point for W discussions, in particular, the >> rootwrap goal which would be good for operators. The OpenStackCLI might be >> another goal to propose for Wallaby. >> >> Detecting Unmaintained Projects Early >> >> --------------------------------------------------- >> >> The TC liaisons program had been created a few releases ago, but the >> initial load on TC members was large. We discussed bringing this program >> back and making the project health checks happen twice a release, either >> the start or end of the release and once in the middle. TC liaisons will >> look at previously proposed releases, release activity of the team, the >> state of tempest plugins, if regular meetings are happening, if there are >> patches in progress and how busy the project’s IRC channel is to make a >> determination. Since more than one liaison will be assigned to each >> project, those liaisons can divvy up the work how they see fit. The other >> aspect that still needs to be decided is where the health checks will be >> recorded- in a wiki? In a meeting and meeting logs? That decision is still >> to be continued. The current #ACTION currently unassigned is that we need >> to assign liaisons for the Victoria cycle and decide when to do the first >> health check. >> >> Friday >> >> ===== >> >> Reducing Systems and Friction to Drive Change >> >> ---------------------------------------------------------------- >> >> This was another conversation that went in circles a bit before realizing >> that we should make a list of the more specific problems we want to address >> and then brainstorm solutions for them. The list we created (including >> things already being worked) are as follows: >> >> - >> >> TC separate from UC (solution in progress) >> - >> >> Stable releases being approved by a separate team (solution in >> progress) >> - >> >> Making repository creation faster (especially for established project >> teams) >> - >> >> Create a process blueprint for project team mergers >> - >> >> Requirements Team being one person >> - >> >> Stable Team >> - >> >> Consolidate the agent experience >> - >> >> Figure out how to improve project <--> openstack client/sdk >> interaction. >> >> If you feel compelled to pick one of these things up and start proposing >> solutions or add to the list, please do! >> >> Monitoring in OpenStack (Ceilometer + Telemetry + Gnocchi State) >> >> >> ----------------------------------------------------------------------------------------- >> >> This conversation is also ongoing, but essentially we talked about the >> state of things right now- largely they are not well maintained and there >> is added complexity with Ceilometers being partially dependent on Gnocchi. >> There are a couple of ideas to look into like using oslo.metrics for the >> interface between all the tools or using Ceilometer without Gnocchi if we >> can clean up those dependencies. No specific action items here, just please >> share your thoughts if you have them. >> >> Ideas Repo Next Steps >> >> ------------------------------- >> >> Out of the Ussuri retrospective, it was brought up that we probably >> needed to talk a little more about what we wanted for this repo. >> Essentially we just want it to be a place to collect ideas into without >> worrying about the how. It should be a place to document ideas we have had >> (old and new) and keep all the discussion in one place as opposed to >> historic email threads, meetings logs, other IRC logs, etc. We decided it >> would be good to periodically go through this repo, likely as a forum >> session at a summit to see if there is any updating that could happen or >> promotion of ideas to community goals, etc. >> >> ‘tc:approved-release’ Tag >> >> --------------------------------- >> >> This topic was proposed by the Manila team from a discussion they had >> earlier in the week. We talked about the history of the tag and how usage >> of tags has evolved. At this point, the proposal is to remove the tag as >> anything in the releases repo is essentially tc-approved. Ghanshyam has >> volunteered to document this and do the removal. The board also needs to be >> notified of this and to look at projects.yaml in the governance repo as the >> source of truth for TC approved projects. The unassigned #ACTION item is to >> review remaining tags and see if there are others that need to be >> modified/removed/added to drive common behavior across OpenSack >> components. >> >> Board Proposals >> >> ---------------------- >> >> This was a pretty quick summary of all discussions we had that had any >> impact on the board and largely decided who would mention them. >> >> >> >> Session Feedback >> >> ------------------------ >> >> This was also a pretty quick topic compared to many of the others, we >> talked about how things went across all our discussions (largely we called >> the PTG a success) logistically. We tried to make good use of the raising >> hands feature which mostly worked, but it lacks context and its possible >> that the conversation has moved on by the time it’s your turn (if you even >> remember what you want to say). >> >> OpenStack 2.0: k8s Native >> >> ----------------------------------- >> >> This topic was brought up at the end of our time so we didn’t have time >> to discuss it really. Basically Mohammed wanted to start the conversation >> about adding k8s as a base service[5] and what we would do if a project >> proposed required k8s. Adding services that work with k8s could open a door >> to new innovation in OpenStack. Obviously this topic will need to be >> discussed further as we barely got started before we had to wrap things up. >> >> >> So. >> >> >> The tldr; >> >> >> Here are the #ACTION items we need owners for: >> >> - >> >> Start the User Facing API Pop Up Team >> - >> >> Write a resolution about how the deconstructed PTL roles will work >> - >> >> Update Goal Selection docs to explain that one or more goals is fine; >> it doesn’t have to be more than one >> - >> >> Two volunteers to start the W goal selection process >> - >> >> Assign two TC liaisons per project >> - >> >> Review Tags to make sure they are still good for driving common >> behavior across all openstack projects >> >> >> Here are the things EVERYONE needs to do: >> >> - >> >> Review last goal proposal so that we can decide to accept or reject >> it for the V release[4] >> - >> >> Add systems that are barriers to progress in openstack to the >> Reducing Systems and Friction list >> - >> >> Continue conversations you find important >> >> >> >> Thanks everyone for your hard work and great conversations :) >> >> Enjoy the attached (photoshopped) team photo :) >> >> -Kendall (diablo_rojo) >> >> >> >> [1] TC PTG Etherpad: https://etherpad.opendev.org/p/tc-victoria-ptg >> >> [2] TC Guide Patch: https://review.opendev.org/#/c/732983/ >> >> [3] UC TC Merge Thread: >> http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014736.html >> >> >> [4] Proposed V Goal: https://review.opendev.org/#/c/731213/ >> >> [5] Base Service Description: >> https://governance.openstack.org/tc/reference/base-services.html >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Thu Aug 6 13:38:30 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 6 Aug 2020 13:38:30 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Thanks for responding ? ________________________________ From: Mark Goddard Sent: Thursday, August 6, 2020 1:16 PM To: Michael Johnson Cc: Monika Samal ; Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From lijie at unitedstack.com Thu Aug 6 16:59:52 2020 From: lijie at unitedstack.com (=?utf-8?B?UmFtYm8=?=) Date: Fri, 7 Aug 2020 00:59:52 +0800 Subject: [cinder] Could you help me review the reimage feature? Message-ID: Hi,all:         I have a spec which is support volume backed server rebuild[0].This spec was accepted in Stein, but some of the work did not finish, so repropose it for Victoria.  I sincerely wish this spec will approved in Victoria, so I make an exception for this, and the Nova team will approved this if the cinder reimage question is solved this week[1].  This spec is depend on the cinder reimage api [2], and the reimage api has a question. We just need to know if cinder are ok with the change in polling to event like the volume extend. More clearly, Cinder reimage should add a new 'volume-reimage' external event like the volume extend, so that nova can wait for cinder to complete the reimage[3].        The Cinder code is[4], if you have some ideas, you can comments on it.Thank you very much! Ref: [0]:https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild [1]:http://eavesdrop.openstack.org/irclogs/%23openstack-meeting-3/%23openstack-meeting-3.2020-08-06.log.html#t2020-08-06T16:18:22-2 [2]:https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api [3]:https://review.opendev.org/#/c/454287/ [4]:https://review.opendev.org/#/c/606346/ Best Regards Rambo -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Aug 6 17:08:29 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 6 Aug 2020 17:08:29 +0000 Subject: [ironic] [infra] bifrost-integration-tinyipa-opensuse-15 broken In-Reply-To: References: Message-ID: <20200806170829.rcotrtmneyeyktbn@yuggoth.org> On 2020-08-06 12:13:36 +0200 (+0200), Dmitry Tantsur wrote: > Our openSUSE CI job has been broken for a few days [1]. It fails on the > early bindep stage with [2] > > percent="-1" rate="-1"/> > rate="-1" done="0"/> > File > './suse/x86_64/libJudy1-1.0.5-lp151.2.2.x86_64.rpm' not found on > medium ' > https://mirror.mtl01.inap.opendev.org/opensuse/distribution/leap/15.1/repo/oss/&apos > ; > > I've raised it on #openstack-infra, but I'm not sure if there has been any > follow up. [...] Yes, we discussed it at some length immediately after you mentioned it: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2020-08-05.log.html#t2020-08-05T15:43:03 In short, the packages are in /opensuse/distribution/leap/15.1/repo/oss/x86_64/ not /opensuse/distribution/leap/15.1/repo/oss/suse/x86_64/ and the INDEX.gz files seem to point to the correct location for them. It's not clear to us why zypper is looking in the latter path; help from someone with more familiarity with openSUSE and zypper would be much appreciated. Our mirrors match the official mirrors in this regard, and our base jobs configure only the first part of the repository path: It's not clear to any of us what's adding the "/suse" to the URLs zypper is requesting. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rosmaita.fossdev at gmail.com Thu Aug 6 21:00:28 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 6 Aug 2020 17:00:28 -0400 Subject: [cinder] Could you help me review the reimage feature? In-Reply-To: References: Message-ID: <36500a46-7fcc-4fa0-fc09-d7235a833c9f@gmail.com> On 8/6/20 12:59 PM, Rambo wrote: > Hi,all: >         I have a spec which is support volume backed server > rebuild[0].This spec was accepted in Stein, but some of the work did not > finish, so repropose it for Victoria.  I sincerely wish this spec will > approved in Victoria, so I make an exception for this, and the Nova team > will approved this *if the cinder reimage question is solved this > week*[1].  This spec is depend on the cinder reimage api [2], and the > reimage api has a question. We just need to know if cinder are ok with > the change in polling to event like the volume extend. More clearly, > Cinder reimage should add a new 'volume-reimage' external event like the > volume extend, so that nova can wait for cinder to complete the reimage[3]. >        The Cinder code is[4], if you have some ideas, you can comments > on it.Thank you very much! The Cinder team is not going to approve this proposal this week, but we encourage you to continue working on it for Wallaby. The spec was approved for Stein and then re-targeted to Train. Until July 30, the last activity on the patch was April 1, 2019, so this has not been on the Cinder team's radar at all this development cycle. Because the spec is outdated, it should be proposed for Wallaby so the current Cinder team can review it and assess how it fits into the current project plans. I've already penciled you in for next week's midcycle so we can discuss this in more depth. But I am against making a snap decision in the next two days. cheers, brian > > > Ref: > > [0]:https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild > > [1]:http://eavesdrop.openstack.org/irclogs/%23openstack-meeting-3/%23openstack-meeting-3.2020-08-06.log.html#t2020-08-06T16:18:22-2 > > [2]:https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api > [3]:https://review.opendev.org/#/c/454287/ > [4]:https://review.opendev.org/#/c/606346/ > Best Regards > Rambo From cboylan at sapwetik.org Thu Aug 6 21:44:54 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 06 Aug 2020 14:44:54 -0700 Subject: [ironic] [infra] bifrost-integration-tinyipa-opensuse-15 broken In-Reply-To: <20200806170829.rcotrtmneyeyktbn@yuggoth.org> References: <20200806170829.rcotrtmneyeyktbn@yuggoth.org> Message-ID: <4e21ac6a-e287-4d3b-b0e6-c581c631e992@www.fastmail.com> On Thu, Aug 6, 2020, at 10:08 AM, Jeremy Stanley wrote: > On 2020-08-06 12:13:36 +0200 (+0200), Dmitry Tantsur wrote: > > Our openSUSE CI job has been broken for a few days [1]. It fails on the > > early bindep stage with [2] > > > > > percent="-1" rate="-1"/> > > > rate="-1" done="0"/> > > File > > './suse/x86_64/libJudy1-1.0.5-lp151.2.2.x86_64.rpm' not found on > > medium ' > > https://mirror.mtl01.inap.opendev.org/opensuse/distribution/leap/15.1/repo/oss/&apos > > ; > > > > I've raised it on #openstack-infra, but I'm not sure if there has been any > > follow up. > [...] > > Yes, we discussed it at some length immediately after you mentioned > it: > > http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2020-08-05.log.html#t2020-08-05T15:43:03 > > In short, the packages are in > /opensuse/distribution/leap/15.1/repo/oss/x86_64/ not > /opensuse/distribution/leap/15.1/repo/oss/suse/x86_64/ and the > INDEX.gz files seem to point to the correct location for them. It's > not clear to us why zypper is looking in the latter path; help from > someone with more familiarity with openSUSE and zypper would be much > appreciated. Our mirrors match the official mirrors in this regard, > and our base jobs configure only the first part of the repository > path: > > https://opendev.org/zuul/zuul-jobs/src/commit/1ba95015acc977dea8269889235434d052c736e2/roles/configure-mirrors/tasks/mirror/Suse.yaml#L3 > > > It's not clear to any of us what's adding the "/suse" to the URLs > zypper is requesting. https://review.opendev.org/745225 has landed and seems to fix this issue. Our hunch is that the type of the repo changed upstream of us which we then mirrored. Once this happened our repo configs were no longer correct. Zypper man pages and docs say repos should have their type auto-detected anyway so we've dropped the type specification entirely. This fixed things in testing. If anyone understands this better that info would be appreciated, but I expect the ironic jobs to be happier now too. Clark From kennelson11 at gmail.com Fri Aug 7 00:00:13 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 6 Aug 2020 17:00:13 -0700 Subject: [TC] New Office Hour Plans Message-ID: Hello! After taking a look at the poll results, Mohammed and I have two proposed plans for office hours: Plan A: Two office hours instead of three. This gives us slightly more coverage than one office hour without overextending ourselves to cover three office hours. Mohammed and I were thinking that one of the reasons why three office hours wasn't working was that it was kind of a big time commitment and TC members could easily rationalize not going to ones later in the week if they had already attended one earlier in the week. The two times that enable most TC members to attend at least one, if not both, would be Monday @14:00 UTC (TC members available: Belmiro, Rico, Kristi, Jay, Mohammed, myself, and Nate + non members) and Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). Plan B: Move to a single office hour on Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). Having only one office hour gives it more weight and importance and that should hopefully encourage more attendance from both community members and TC members alike. I guess Plan C is to go ahead with Plan A and then if we don't see activity during the Monday time slot, to reduce down to one office hour and go with Plan B. Please check out the patches Mohammed posted [1][2] and vote on what you'd prefer! -Kendall (diablo_rojo) [1] Dual Office Hour: https://review.opendev.org/#/c/745201/ [2] Single Office Hour: https://review.opendev.org/#/c/745200/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Thu Aug 6 23:11:52 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 6 Aug 2020 23:11:52 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: I tried following above document still facing same Octavia connection error with amphora image. Regards, Monika ________________________________ From: Mark Goddard Sent: Thursday, August 6, 2020 1:16:01 PM To: Michael Johnson Cc: Monika Samal ; Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at gmail.com Fri Aug 7 00:29:11 2020 From: berndbausch at gmail.com (Bernd Bausch) Date: Fri, 7 Aug 2020 09:29:11 +0900 Subject: [openstack-community] [infra] Problem with ask.openstack.org Message-ID: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> While ask.openstack.org is not necessarily loved by many, it continues to be used, and there are still people who answer questions. Recently, one of its features ceased working. I am talking about the "responses" page that lists all responses to questions that I have answered or commented on. This makes it very hard to follow up on such questions; I don't have a tool to see if somebody anwered my question or is the person who asked a question has provided updates. Is there anybody who can fix this? I know that some people would like to do away with ask.openstack.org entirely, since the software is bug-ridden and nobody manages the site. My personal opinion is that the current situation is worse than no "ask" site at all, since people might ask questions, get partial answers and no follow-up. This can create a negative view of the OpenStack community. In short, either fix it or remove it. Unfortunately I don't have the means to do either. Bernd. From laurentfdumont at gmail.com Fri Aug 7 00:36:26 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Thu, 6 Aug 2020 20:36:26 -0400 Subject: [openstack-community] [infra] Problem with ask.openstack.org In-Reply-To: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> References: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> Message-ID: It's definitely a tough sell. I'm not sure if it's worse to not have a community driven "a-la-Stackoverflow" style or one that is not super in shape. I would rather see go into archive mode only if it's too much of an Operational burden to keep running :( On Thu, Aug 6, 2020 at 8:33 PM Bernd Bausch wrote: > While ask.openstack.org is not necessarily loved by many, it continues > to be used, and there are still people who answer questions. > > Recently, one of its features ceased working. I am talking about the > "responses" page that lists all responses to questions that I have > answered or commented on. This makes it very hard to follow up on such > questions; I don't have a tool to see if somebody anwered my question or > is the person who asked a question has provided updates. > > Is there anybody who can fix this? > > I know that some people would like to do away with ask.openstack.org > entirely, since the software is bug-ridden and nobody manages the site. > My personal opinion is that the current situation is worse than no "ask" > site at all, since people might ask questions, get partial answers and > no follow-up. This can create a negative view of the OpenStack community. > > In short, either fix it or remove it. Unfortunately I don't have the > means to do either. > > Bernd. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Fri Aug 7 00:41:36 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 06 Aug 2020 17:41:36 -0700 Subject: [openstack-community] [infra] Problem with ask.openstack.org In-Reply-To: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> References: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> Message-ID: <9b6967b9-5cb7-47f2-bacd-87d3304a3428@www.fastmail.com> On Thu, Aug 6, 2020, at 5:29 PM, Bernd Bausch wrote: > While ask.openstack.org is not necessarily loved by many, it continues > to be used, and there are still people who answer questions. > > Recently, one of its features ceased working. I am talking about the > "responses" page that lists all responses to questions that I have > answered or commented on. This makes it very hard to follow up on such > questions; I don't have a tool to see if somebody anwered my question or > is the person who asked a question has provided updates. > > Is there anybody who can fix this? > > I know that some people would like to do away with ask.openstack.org > entirely, since the software is bug-ridden and nobody manages the site. > My personal opinion is that the current situation is worse than no "ask" > site at all, since people might ask questions, get partial answers and > no follow-up. This can create a negative view of the OpenStack community. > > In short, either fix it or remove it. Unfortunately I don't have the > means to do either. I'm not able to debug the issue at this moment, but did want to point out that all of our config management is collaboratively managed in Git repos code reviewed in Gerrit. This means that if you know what the problem is you absolutely can fix it. Or if you'd prefer to turn off the service you can write a change for that as well. The biggest gap is in identifying the issue without access to server logs. Depending on the issue figuring out what is going on may require access. Relevant bits of code: https://opendev.org/opendev/system-config/src/branch/master/manifests/site.pp#L525-L538 https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/ask.pp https://opendev.org/opendev/puppet-askbot Finally, we also expose server and service statistics via cacti and graphite. These can be useful for checking service health: http://cacti.openstack.org/cacti/graph_view.php https://grafana.opendev.org/?orgId=1 Clark From yasufum.o at gmail.com Fri Aug 7 08:37:16 2020 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Fri, 7 Aug 2020 17:37:16 +0900 Subject: [tacker] PTL on vacation Message-ID: <9f29f573-69f5-d6d2-20d9-5c5ef7d775f4@gmail.com> I will be on vacation from 10th to 17th Aug. I would like to skip the next IRC meeting because many of tacker members are also on vacation next week. Thanks, Yasufumi From berndbausch at gmail.com Fri Aug 7 09:09:31 2020 From: berndbausch at gmail.com (Bernd Bausch) Date: Fri, 7 Aug 2020 18:09:31 +0900 Subject: [openstack-community] [infra] Problem with ask.openstack.org In-Reply-To: References: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> Message-ID: <21ed671d-63d2-53e1-1ce0-31b977515be6@gmail.com> Sending people to Stackoverflow directly is a good option IMO. This suggestion was made before. Of course, I would lose my 7700 karma points, but I can stomach it :) On 8/7/2020 9:36 AM, Laurent Dumont wrote: > It's definitely a tough sell. I'm not sure if it's worse to not have a > community driven "a-la-Stackoverflow" style or one that is not super > in shape. > > I would rather see go into archive mode only if it's too much of an > Operational burden to keep running :( > > On Thu, Aug 6, 2020 at 8:33 PM Bernd Bausch > wrote: > > While ask.openstack.org is not > necessarily loved by many, it continues > to be used, and there are still people who answer questions. > > Recently, one of its features ceased working. I am talking about the > "responses" page that lists all responses to questions that I have > answered or commented on. This makes it very hard to follow up on > such > questions; I don't have a tool to see if somebody anwered my > question or > is the person who asked a question has provided updates. > > Is there anybody who can fix this? > > I know that some people would like to do away with > ask.openstack.org > entirely, since the software is bug-ridden and nobody manages the > site. > My personal opinion is that the current situation is worse than no > "ask" > site at all, since people might ask questions, get partial answers > and > no follow-up. This can create a negative view of the OpenStack > community. > > In short, either fix it or remove it. Unfortunately I don't have the > means to do either. > > Bernd. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Aug 7 09:18:48 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 7 Aug 2020 11:18:48 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: Thanks a lot for starting this discussion, I am also quite concerned about this. At StackHPC we started looking into CloudKitty a year ago, when the community was still fairly active. There was an IRC meeting every month or so throughout 2019. Patches were getting merged. Unfortunately in 2020 activity stopped abruptly. There hasn't been any IRC meeting since early December and no patch has been merged since the end of March. I have submitted straightforward stable backports of bug fixes which have not received any answer. I am well aware of the difficulty of keeping up with open-source project maintenance when work deadlines are always taking priority. If the existing core team would be willing to grant +2 votes to more people, I would be happy to participate in the maintenance of the project. We've now deployed CloudKitty for several of our customers and have to maintain a stable fork anyway. We would rather maintain upstream directly! Pierre Riteau (priteau) On Tue, 4 Aug 2020 at 23:22, Kendall Nelson wrote: > > I think the majority of 'maintenance' activities at the moment for Cloudkitty are the reviewing of open patches in gerrit [1] and triaging bugs that are reported in Launchpad[2] as they come in. When things come up on this mailing list that have the cloudkitty tag in the subject line (like this email), weighing in on them would also be helpful. > > If you need help getting setup with gerrit, I am happy to assist anyway I can :) > > -Kendall Nelson (diablo_rojo) > > [1] https://review.opendev.org/#/q/project:openstack/cloudkitty+OR+project:openstack/python-cloudkittyclient+OR+project:openstack/cloudkitty-dashboard > [2] https://launchpad.net/cloudkitty > > > On Tue, Aug 4, 2020 at 6:21 AM Rafael Weingärtner wrote: >> >> I am not sure how the projects/communities here in OpenStack are maintained and conducted, but I could for sure help. >> I am a committer and PMC for some Apache projects; therefore, I am a bit familiar with some processes in OpenSource communities. >> >> On Tue, Aug 4, 2020 at 5:11 AM Mark Goddard wrote: >>> >>> On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner >>> wrote: >>> > >>> > We are working on it. So far we have 3 open proposals there, but we do not have enough karma to move things along. >>> > Besides these 3 open proposals, we do have more ongoing extensions that have not yet been proposed to the community. >>> >>> It's good to hear you want to help improve cloudkitty, however it >>> sounds like what is required is help with maintaining the project. Is >>> that something you could be involved with? >>> Mark >>> >>> > >>> > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis wrote: >>> >> >>> >> Posting here to raise awareness, and start discussion about next steps. >>> >> >>> >> It appears there is no one working on Cloudkitty anymore. No patches >>> >> have been merged for several months now, including simple bot proposed >>> >> patches. It would appear no one is maintaining this project anymore. >>> >> >>> >> I know there is a need out there for this type of functionality, so >>> >> maybe this will raise awareness and get some attention to it. But >>> >> barring that, I am wondering if we should start the process to retire >>> >> this project. >>> >> >>> >> From a Victoria release perspective, it is milestone-2 week, so we >>> >> should make a decision if any of the Cloudkitty deliverables should be >>> >> included in this release or not. We can certainly force releases of >>> >> whatever is the latest, but I think that is a bit risky since these >>> >> repos have never merged the job template change for victoria and >>> >> therefore are not even testing with Python 3.8. That is an official >>> >> runtime for Victoria, so we run the risk of having issues with the code >>> >> if someone runs under 3.8 but we have not tested to make sure there are >>> >> no problems doing so. >>> >> >>> >> I am hoping this at least starts the discussion. I will not propose any >>> >> release patches to remove anything until we have had a chance to discuss >>> >> the situation. >>> >> >>> >> Sean >>> >> >>> >> >>> > >>> > >>> > -- >>> > Rafael Weingärtner >> >> >> >> -- >> Rafael Weingärtner From skaplons at redhat.com Fri Aug 7 10:19:38 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 7 Aug 2020 12:19:38 +0200 Subject: [neutron][OVS firewall] Multicast non-IGMP traffic is allowed by default, not in iptables FW (LP#1889631) In-Reply-To: References: Message-ID: <38E3A820-FD9D-4A2B-B989-4735092D304F@redhat.com> Hi, > On 4 Aug 2020, at 19:05, Rodolfo Alonso Hernandez wrote: > > Hello all: > > First of all, the link: https://bugs.launchpad.net/neutron/+bug/1889631 > > To sum up the bug: in iptables FW, the non-IGMP multicast traffic from 224.0.0.x was blocked; this is not happening in OVS FW. > > That was discussed today in the Neutron meeting today [1]. We face two possible situations here: > - If we block this traffic now, some deployments using the OVS FW will experience an unexpected network blockage. I would be for this option but left stable branches not touched. Additionally we should of course add release note with info that this behaviour changed now and also we can add upgrade check which will write warning about that if any of the agents in the DB is using “openvswitch” firewall driver. I don’t think we can do anything more to warn users about such change. > - Deployments migrating from iptables to OVS FW, now won't be able to explicitly allow this traffic (or block it by default). This also breaks the current API, because some rules won't have any effect (those ones allowing this traffic). This is current issue, right? If we would fix it as You proposed above, then behaviour between both drivers would be the same. Am I understanding correct? > > A possible solution is to add a new knob in the FW configuration; this config option will allow to block or not this traffic by default. Remember that the FW can only create permissive rules, not blocking ones. I don’t like to add yet another config knob for that. And also as I think Akihiro mentioned it’s not good practice to change API behaviour depending on config options. This wouldn’t be discoverable in API. > > Any feedback is welcome! > > Regards. > > [1]http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-08-04-14.00.log.html#l-136 > > — Slawek Kaplonski Principal software engineer Red Hat From ralonsoh at redhat.com Fri Aug 7 10:30:53 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 7 Aug 2020 11:30:53 +0100 Subject: [neutron][OVS firewall] Multicast non-IGMP traffic is allowed by default, not in iptables FW (LP#1889631) In-Reply-To: <38E3A820-FD9D-4A2B-B989-4735092D304F@redhat.com> References: <38E3A820-FD9D-4A2B-B989-4735092D304F@redhat.com> Message-ID: Hi Slawek: I agree with Akihiro and you: - This should be fixed to match both FW behaviour, but only in master. - Of course, a "big" release note to make this public. - Not to add a knob that changes the API behaviour. I'll wait for more feedback. Although I'll be on PTO, I'll check and reply to the mail. Thank you and regards. On Fri, Aug 7, 2020 at 11:19 AM Slawek Kaplonski wrote: > Hi, > > > On 4 Aug 2020, at 19:05, Rodolfo Alonso Hernandez > wrote: > > > > Hello all: > > > > First of all, the link: https://bugs.launchpad.net/neutron/+bug/1889631 > > > > To sum up the bug: in iptables FW, the non-IGMP multicast traffic from > 224.0.0.x was blocked; this is not happening in OVS FW. > > > > That was discussed today in the Neutron meeting today [1]. We face two > possible situations here: > > - If we block this traffic now, some deployments using the OVS FW will > experience an unexpected network blockage. > > I would be for this option but left stable branches not touched. > Additionally we should of course add release note with info that this > behaviour changed now and also we can add upgrade check which will write > warning about that if any of the agents in the DB is using “openvswitch” > firewall driver. > I don’t think we can do anything more to warn users about such change. > > > - Deployments migrating from iptables to OVS FW, now won't be able to > explicitly allow this traffic (or block it by default). This also breaks > the current API, because some rules won't have any effect (those ones > allowing this traffic). > > This is current issue, right? If we would fix it as You proposed above, > then behaviour between both drivers would be the same. Am I understanding > correct? > > > > > A possible solution is to add a new knob in the FW configuration; this > config option will allow to block or not this traffic by default. Remember > that the FW can only create permissive rules, not blocking ones. > > I don’t like to add yet another config knob for that. And also as I think > Akihiro mentioned it’s not good practice to change API behaviour depending > on config options. This wouldn’t be discoverable in API. > > > > > Any feedback is welcome! > > > > Regards. > > > > [1] > http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-08-04-14.00.log.html#l-136 > > > > > > — > Slawek Kaplonski > Principal software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Aug 7 12:45:15 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 7 Aug 2020 12:45:15 +0000 Subject: [openstack-community] [infra] Problem with ask.openstack.org In-Reply-To: <21ed671d-63d2-53e1-1ce0-31b977515be6@gmail.com> References: <6d128727-27b5-ff0d-6798-fbcf72998012@gmail.com> <21ed671d-63d2-53e1-1ce0-31b977515be6@gmail.com> Message-ID: <20200807124515.7k5xhijkj6mi4lec@yuggoth.org> On 2020-08-07 18:09:31 +0900 (+0900), Bernd Bausch wrote: > Sending people to Stackoverflow directly is a good option IMO. [...] Yes, ask.openstack.org was originally created for two reasons: 1. We could not keep up with the constant spam load on forums.openstack.org, but when we wanted to shut it down we kept hearing that many OpenStack users needed us to provide a Web forum because they wouldn't/couldn't use E-mail. 2. When we approached Stackexchange/Stackoverflow about getting a site like Ubuntu had, they said OpenStack was not popular enough software to warrant that. OSF originally contracted the author of Askbot to assist in maintaining the ask.openstack.org site, but his interests eventually moved on to other endeavors and the site has sat unmaintained (except for an occasional reboot by community infrastructure team sysadmins) for a number of years now. At this point it's a liability, and unless folks are interested in getting it back into a well-managed state I think we probably have no choice but to phase it out. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Fri Aug 7 13:33:18 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 07 Aug 2020 08:33:18 -0500 Subject: [TC] New Office Hour Plans In-Reply-To: References: Message-ID: <173c920361f.e63c2eb4109191.7004381406663804589@ghanshyammann.com> ---- On Thu, 06 Aug 2020 19:00:13 -0500 Kendall Nelson wrote ---- > Hello! > After taking a look at the poll results, Mohammed and I have two proposed plans for office hours: > Plan A: Two office hours instead of three. This gives us slightly more coverage than one office hour without overextending ourselves to cover three office hours. Mohammed and I were thinking that one of the reasons why three office hours wasn't working was that it was kind of a big time commitment and TC members could easily rationalize not going to ones later in the week if they had already attended one earlier in the week. The two times that enable most TC members to attend at least one, if not both, would be Monday @14:00 UTC (TC members available: Belmiro, Rico, Kristi, Jay, Mohammed, myself, and Nate + non members) and Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). > Plan B: Move to a single office hour on Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). Having only one office hour gives it more weight and importance and that should hopefully encourage more attendance from both community members and TC members alike. Thanks Kendal for following up on office hours plan. The idea of having multiple office hours was to cover TC availability across different TZ. Even Asia TZ office hours might not have many TC members available but still, someone to address/ack the issues and bring it to TC when most of the members are available. There was no expectation for all TC members to be present in all three office hours so all office hours being inactive might be due to some other reason not due to *many office hours*. Thursday office hour was most TC available one which is not the case anymore. I MO, we should consider the 'covering most of TZ (as much we can) for TC-availability' so in first option we can move either of the office hour in different TZ. I still in favor of moving to weekly TC meeting (in alternate TZ or so) than office hours but I am ok to give office hours a another try with new time. -gmann > I guess Plan C is to go ahead with Plan A and then if we don't see activity during the Monday time slot, to reduce down to one office hour and go with Plan B. > Please check out the patches Mohammed posted [1][2] and vote on what you'd prefer! > -Kendall (diablo_rojo) > [1] Dual Office Hour: https://review.opendev.org/#/c/745201/[2] Single Office Hour: https://review.opendev.org/#/c/745200/ From gmann at ghanshyammann.com Fri Aug 7 14:10:53 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 07 Aug 2020 09:10:53 -0500 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Thanks, Pierre for helping with this. ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) but I am not sure if he got any response back. Can you also send email to PTL as well as the current core team to add you in the core list for project maintenance? Please note that, migration of CI/CD to ubuntu work might break the cloudkitty gate if patches are not merged on time. I am still working on few repos though. -gmann ---- On Fri, 07 Aug 2020 04:18:48 -0500 Pierre Riteau wrote ---- > Thanks a lot for starting this discussion, I am also quite concerned about this. > > At StackHPC we started looking into CloudKitty a year ago, when the > community was still fairly active. There was an IRC meeting every > month or so throughout 2019. Patches were getting merged. > > Unfortunately in 2020 activity stopped abruptly. There hasn't been any > IRC meeting since early December and no patch has been merged since > the end of March. I have submitted straightforward stable backports of > bug fixes which have not received any answer. > > I am well aware of the difficulty of keeping up with open-source > project maintenance when work deadlines are always taking priority. If > the existing core team would be willing to grant +2 votes to more > people, I would be happy to participate in the maintenance of the > project. We've now deployed CloudKitty for several of our customers > and have to maintain a stable fork anyway. We would rather maintain > upstream directly! > > Pierre Riteau (priteau) > > > On Tue, 4 Aug 2020 at 23:22, Kendall Nelson wrote: > > > > I think the majority of 'maintenance' activities at the moment for Cloudkitty are the reviewing of open patches in gerrit [1] and triaging bugs that are reported in Launchpad[2] as they come in. When things come up on this mailing list that have the cloudkitty tag in the subject line (like this email), weighing in on them would also be helpful. > > > > If you need help getting setup with gerrit, I am happy to assist anyway I can :) > > > > -Kendall Nelson (diablo_rojo) > > > > [1] https://review.opendev.org/#/q/project:openstack/cloudkitty+OR+project:openstack/python-cloudkittyclient+OR+project:openstack/cloudkitty-dashboard > > [2] https://launchpad.net/cloudkitty > > > > > > On Tue, Aug 4, 2020 at 6:21 AM Rafael Weingärtner wrote: > >> > >> I am not sure how the projects/communities here in OpenStack are maintained and conducted, but I could for sure help. > >> I am a committer and PMC for some Apache projects; therefore, I am a bit familiar with some processes in OpenSource communities. > >> > >> On Tue, Aug 4, 2020 at 5:11 AM Mark Goddard wrote: > >>> > >>> On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner > >>> wrote: > >>> > > >>> > We are working on it. So far we have 3 open proposals there, but we do not have enough karma to move things along. > >>> > Besides these 3 open proposals, we do have more ongoing extensions that have not yet been proposed to the community. > >>> > >>> It's good to hear you want to help improve cloudkitty, however it > >>> sounds like what is required is help with maintaining the project. Is > >>> that something you could be involved with? > >>> Mark > >>> > >>> > > >>> > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis wrote: > >>> >> > >>> >> Posting here to raise awareness, and start discussion about next steps. > >>> >> > >>> >> It appears there is no one working on Cloudkitty anymore. No patches > >>> >> have been merged for several months now, including simple bot proposed > >>> >> patches. It would appear no one is maintaining this project anymore. > >>> >> > >>> >> I know there is a need out there for this type of functionality, so > >>> >> maybe this will raise awareness and get some attention to it. But > >>> >> barring that, I am wondering if we should start the process to retire > >>> >> this project. > >>> >> > >>> >> From a Victoria release perspective, it is milestone-2 week, so we > >>> >> should make a decision if any of the Cloudkitty deliverables should be > >>> >> included in this release or not. We can certainly force releases of > >>> >> whatever is the latest, but I think that is a bit risky since these > >>> >> repos have never merged the job template change for victoria and > >>> >> therefore are not even testing with Python 3.8. That is an official > >>> >> runtime for Victoria, so we run the risk of having issues with the code > >>> >> if someone runs under 3.8 but we have not tested to make sure there are > >>> >> no problems doing so. > >>> >> > >>> >> I am hoping this at least starts the discussion. I will not propose any > >>> >> release patches to remove anything until we have had a chance to discuss > >>> >> the situation. > >>> >> > >>> >> Sean > >>> >> > >>> >> > >>> > > >>> > > >>> > -- > >>> > Rafael Weingärtner > >> > >> > >> > >> -- > >> Rafael Weingärtner > > From mark at stackhpc.com Fri Aug 7 14:11:12 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 7 Aug 2020 15:11:12 +0100 Subject: [kolla] Kolla klub break Message-ID: Hi, We agreed in Wednesday's IRC meeting to take a short summer break from the klub. Let's meet again on 10th September. Thanks to everyone who has taken part in these meetings so far, we've had some really great discussions. As always, if anyone has ideas for topics, please add them to the Google doc. Looking forward to some more great sessions in September. https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw/edit# Thanks, Mark From balazs.gibizer at est.tech Fri Aug 7 15:26:53 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Fri, 07 Aug 2020 17:26:53 +0200 Subject: [nova] Nova PTL is on PTO until 24th of Aug Message-ID: Hi, I will be on vacation during the next two weeks. Cheers, gibi From pierre at stackhpc.com Fri Aug 7 16:10:45 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 7 Aug 2020 18:10:45 +0200 Subject: Helping out with CloudKitty maintenance Message-ID: Hello, Following the discussion about the state of CloudKitty [1], I would like to volunteer my help with maintaining the project, as no one of the core team appears to be active at the moment. I have been working with CloudKitty for about a year and have used both the Gnocchi and Monasca collectors. Being a core reviewer on two other OpenStack projects, I am familiar with the process of maintaining OpenStack code. Would it be possible to get core reviewer privileges to help? I would initially focus on keeping CI green and making sure bug fixes are merged and backported. Thanks in advance, Pierre Riteau (priteau) [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016384.html From rafaelweingartner at gmail.com Fri Aug 7 16:21:53 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 7 Aug 2020 13:21:53 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: I see. Thanks for the heads up. I will try to dedicate some time every week for these tasks. On Tue, Aug 4, 2020 at 6:22 PM Kendall Nelson wrote: > I think the majority of 'maintenance' activities at the moment for > Cloudkitty are the reviewing of open patches in gerrit [1] and triaging > bugs that are reported in Launchpad[2] as they come in. When things come up > on this mailing list that have the cloudkitty tag in the subject line (like > this email), weighing in on them would also be helpful. > > If you need help getting setup with gerrit, I am happy to assist anyway I > can :) > > -Kendall Nelson (diablo_rojo) > > [1] > https://review.opendev.org/#/q/project:openstack/cloudkitty+OR+project:openstack/python-cloudkittyclient+OR+project:openstack/cloudkitty-dashboard > [2] https://launchpad.net/cloudkitty > > > On Tue, Aug 4, 2020 at 6:21 AM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> I am not sure how the projects/communities here in OpenStack are >> maintained and conducted, but I could for sure help. >> I am a committer and PMC for some Apache projects; therefore, I am a bit >> familiar with some processes in OpenSource communities. >> >> On Tue, Aug 4, 2020 at 5:11 AM Mark Goddard wrote: >> >>> On Thu, 30 Jul 2020 at 14:43, Rafael Weingärtner >>> wrote: >>> > >>> > We are working on it. So far we have 3 open proposals there, but we do >>> not have enough karma to move things along. >>> > Besides these 3 open proposals, we do have more ongoing extensions >>> that have not yet been proposed to the community. >>> >>> It's good to hear you want to help improve cloudkitty, however it >>> sounds like what is required is help with maintaining the project. Is >>> that something you could be involved with? >>> Mark >>> >>> > >>> > On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis >>> wrote: >>> >> >>> >> Posting here to raise awareness, and start discussion about next >>> steps. >>> >> >>> >> It appears there is no one working on Cloudkitty anymore. No patches >>> >> have been merged for several months now, including simple bot proposed >>> >> patches. It would appear no one is maintaining this project anymore. >>> >> >>> >> I know there is a need out there for this type of functionality, so >>> >> maybe this will raise awareness and get some attention to it. But >>> >> barring that, I am wondering if we should start the process to retire >>> >> this project. >>> >> >>> >> From a Victoria release perspective, it is milestone-2 week, so we >>> >> should make a decision if any of the Cloudkitty deliverables should be >>> >> included in this release or not. We can certainly force releases of >>> >> whatever is the latest, but I think that is a bit risky since these >>> >> repos have never merged the job template change for victoria and >>> >> therefore are not even testing with Python 3.8. That is an official >>> >> runtime for Victoria, so we run the risk of having issues with the >>> code >>> >> if someone runs under 3.8 but we have not tested to make sure there >>> are >>> >> no problems doing so. >>> >> >>> >> I am hoping this at least starts the discussion. I will not propose >>> any >>> >> release patches to remove anything until we have had a chance to >>> discuss >>> >> the situation. >>> >> >>> >> Sean >>> >> >>> >> >>> > >>> > >>> > -- >>> > Rafael Weingärtner >>> >> >> >> -- >> Rafael Weingärtner >> > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.ramirez at opencloud.es Fri Aug 7 16:30:30 2020 From: luis.ramirez at opencloud.es (Luis Ramirez) Date: Fri, 7 Aug 2020 18:30:30 +0200 Subject: Helping out with CloudKitty maintenance In-Reply-To: References: Message-ID: Hi, +1. We need to move fwd to keep it active. I’m also working on a charm for CloudKitty. Br Luis Rmz El El vie, 7 ago 2020 a las 18:16, Pierre Riteau escribió: > Hello, > > Following the discussion about the state of CloudKitty [1], I would > like to volunteer my help with maintaining the project, as no one of > the core team appears to be active at the moment. I have been working > with CloudKitty for about a year and have used both the Gnocchi and > Monasca collectors. Being a core reviewer on two other OpenStack > projects, I am familiar with the process of maintaining OpenStack > code. > > Would it be possible to get core reviewer privileges to help? I would > initially focus on keeping CI green and making sure bug fixes are > merged and backported. > > Thanks in advance, > Pierre Riteau (priteau) > > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016384.html > > -- Br, Luis Rmz Blockchain, DevOps & Open Source Cloud Solutions Architect ---------------------------------------- Founder & CEO OpenCloud.es luis.ramirez at opencloud.es Skype ID: d.overload Hangouts: luis.ramirez at opencloud.es +34 911 950 123 / +39 392 1289553 / +49 152 26917722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Fri Aug 7 17:12:25 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Fri, 7 Aug 2020 12:12:25 -0500 Subject: [TC] New Office Hour Plans In-Reply-To: <173c920361f.e63c2eb4109191.7004381406663804589@ghanshyammann.com> References: <173c920361f.e63c2eb4109191.7004381406663804589@ghanshyammann.com> Message-ID: <976bf811-536b-faff-cb30-dbab1ac6d83a@gmail.com> On 8/7/2020 8:33 AM, Ghanshyam Mann wrote: > ---- On Thu, 06 Aug 2020 19:00:13 -0500 Kendall Nelson wrote ---- > > Hello! > > After taking a look at the poll results, Mohammed and I have two proposed plans for office hours: > > Plan A: Two office hours instead of three. This gives us slightly more coverage than one office hour without overextending ourselves to cover three office hours. Mohammed and I were thinking that one of the reasons why three office hours wasn't working was that it was kind of a big time commitment and TC members could easily rationalize not going to ones later in the week if they had already attended one earlier in the week. The two times that enable most TC members to attend at least one, if not both, would be Monday @14:00 UTC (TC members available: Belmiro, Rico, Kristi, Jay, Mohammed, myself, and Nate + non members) and Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). > > Plan B: Move to a single office hour on Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). Having only one office hour gives it more weight and importance and that should hopefully encourage more attendance from both community members and TC members alike. > > Thanks Kendal for following up on office hours plan. > > The idea of having multiple office hours was to cover TC availability across different TZ. Even Asia TZ office > hours might not have many TC members available but still, someone to address/ack the issues and bring it to > TC when most of the members are available. There was no expectation for all TC members to be present in all > three office hours so all office hours being inactive might be due to some other reason not due to *many office hours*. > > Thursday office hour was most TC available one which is not the case anymore. > > I MO, we should consider the 'covering most of TZ (as much we can) for TC-availability' so in first option we can move > either of the office hour in different TZ. I think that Gmann makes a good point here.  If we are going to have multiple office hours one should be in an AP timezone.  Was there a second time where the most people were in the AP timeframe were available? So, reduce to two office hours and try to cover both sides of the world? Jay > > I still in favor of moving to weekly TC meeting (in alternate TZ or so) than office hours but I am ok to give office hours a > another try with new time. > > -gmann > > > I guess Plan C is to go ahead with Plan A and then if we don't see activity during the Monday time slot, to reduce down to one office hour and go with Plan B. > > Please check out the patches Mohammed posted [1][2] and vote on what you'd prefer! > > -Kendall (diablo_rojo) > > [1] Dual Office Hour: https://review.opendev.org/#/c/745201/[2] Single Office Hour: https://review.opendev.org/#/c/745200/ > From sean.mcginnis at gmx.com Fri Aug 7 19:56:38 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 7 Aug 2020 14:56:38 -0500 Subject: [all] Proposed Wallaby cycle schedule Message-ID: <2e56de68-c416-e3ea-f3da-caaf9399287d@gmx.com> Hey everyone, The Victoria cycle is going by fast, and it's already time to start planning some of the early things for the Wallaby release. One of the first steps for that is actually deciding on the release schedule. Typically we have done this based on when the next Summit event was planned to take place. Due to several reasons, we don't have a date yet for the first 2021 event. The current thinking is it will likely take place in May (nothing is set, just an educated guess, so please don't use that for any other planning). So for the sake of figuring out the release schedule, we are targeting a release date in early May. Hopefully this will then align well with event plans. I have a proposed release schedule up for review here: https://review.opendev.org/#/c/744729/ For ease of viewing (until the job logs are garbage collected), you can see the rendered schedule here: https://0e6b8aeca433e85b429b-46fd243db6dc394538bd0555f339eba5.ssl.cf1.rackcdn.com/744729/3/check/openstack-tox-docs/4f76901/docs/wallaby/schedule.html There are always outside conflicts, but I think this has aligned mostly well with major holidays. But please feel free to comment on the patch if you see any major issues that we may have not considered. Thanks! Sean From mnaser at vexxhost.com Fri Aug 7 20:22:08 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Aug 2020 16:22:08 -0400 Subject: [TC] New Office Hour Plans In-Reply-To: <173c920361f.e63c2eb4109191.7004381406663804589@ghanshyammann.com> References: <173c920361f.e63c2eb4109191.7004381406663804589@ghanshyammann.com> Message-ID: On Fri, Aug 7, 2020 at 9:37 AM Ghanshyam Mann wrote: > > ---- On Thu, 06 Aug 2020 19:00:13 -0500 Kendall Nelson wrote ---- > > Hello! > > After taking a look at the poll results, Mohammed and I have two proposed plans for office hours: > > Plan A: Two office hours instead of three. This gives us slightly more coverage than one office hour without overextending ourselves to cover three office hours. Mohammed and I were thinking that one of the reasons why three office hours wasn't working was that it was kind of a big time commitment and TC members could easily rationalize not going to ones later in the week if they had already attended one earlier in the week. The two times that enable most TC members to attend at least one, if not both, would be Monday @14:00 UTC (TC members available: Belmiro, Rico, Kristi, Jay, Mohammed, myself, and Nate + non members) and Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). > > Plan B: Move to a single office hour on Wednesday @ 15:00 UTC (TC members available: Rico, Kristi, Jay, Mohammed, myself, Ghanshyam, and Nate + non members). Having only one office hour gives it more weight and importance and that should hopefully encourage more attendance from both community members and TC members alike. > > Thanks Kendal for following up on office hours plan. > > The idea of having multiple office hours was to cover TC availability across different TZ. Even Asia TZ office > hours might not have many TC members available but still, someone to address/ack the issues and bring it to > TC when most of the members are available. There was no expectation for all TC members to be present in all > three office hours so all office hours being inactive might be due to some other reason not due to *many office hours*. I think if we limit the number of times, then more people can likely show up because it's a smaller commitment. The 2 office hours except Thursday are pretty much non-existant at that point > Thursday office hour was most TC available one which is not the case anymore. The Wednesday was actually the time where we had 10 people mention they'd be available, 9 of them being TC members. I'm hoping that is the most successful time line > I MO, we should consider the 'covering most of TZ (as much we can) for TC-availability' so in first option we can move > either of the office hour in different TZ. > > I still in favor of moving to weekly TC meeting (in alternate TZ or so) than office hours but I am ok to give office hours a > another try with new time. I think given the commitment I see here, I am confident Wednesday should be successful: https://doodle.com/poll/q27t8pucq7b8xbme > -gmann > > > I guess Plan C is to go ahead with Plan A and then if we don't see activity during the Monday time slot, to reduce down to one office hour and go with Plan B. > > Please check out the patches Mohammed posted [1][2] and vote on what you'd prefer! > > -Kendall (diablo_rojo) > > [1] Dual Office Hour: https://review.opendev.org/#/c/745201/[2] Single Office Hour: https://review.opendev.org/#/c/745200/ > -- Mohammed Naser VEXXHOST, Inc. From its-openstack at zohocorp.com Fri Aug 7 07:21:32 2020 From: its-openstack at zohocorp.com (its-openstack at zohocorp.com) Date: Fri, 07 Aug 2020 12:51:32 +0530 Subject: Openstack-Train VCPU issue in Hyper-V Message-ID: <173c7cbda12.e31c17678315.6864041805036536996@zohocorp.com> Dear Team,    We are using Openstack-Train in our organization.We have created windows server 2016 Std R2 instances with this flavor m5.xlarge ( RAM - 65536 , Disk - 500 , VCPUs - 16 ).Once Hyper-V future enabled in this instances VCPU count is automatically reduced to 1 core after restart.Even we have enabled nested virtualisation in openstack compute server.Herewith attached screenshot for your references.Please help us to short out this issue. #cat /sys/module/kvm_intel/parameters/nested Y Regards, Sysadmin. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Before Hyper-V.bmp Type: application/octet-stream Size: 2306502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: After Hyper-V .png Type: image/png Size: 47337 bytes Desc: not available URL: From cohuck at redhat.com Fri Aug 7 11:59:42 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Fri, 7 Aug 2020 13:59:42 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> Message-ID: <20200807135942.5d56a202.cohuck@redhat.com> On Wed, 05 Aug 2020 12:35:01 +0100 Sean Mooney wrote: > On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote: > > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: (...) > > > software_version: device driver's version. > > > in .[.bugfix] scheme, where there is no > > > compatibility across major versions, minor versions have > > > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > > > bugfix version number indicates some degree of internal > > > improvement that is not visible to the user in terms of > > > features or compatibility, > > > > > > vendor specific attributes: each vendor may define different attributes > > > device id : device id of a physical devices or mdev's parent pci device. > > > it could be equal to pci id for pci devices > > > aggregator: used together with mdev_type. e.g. aggregator=2 together > > > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > > > graphics device. > > > remote_url: for a local NVMe VF, it may be configured with a remote > > > url of a remote storage and all data is stored in the > > > remote side specified by the remote url. > > > ... > just a minor not that i find ^ much more simmple to understand then > the current proposal with self and compatiable. > if i have well defiend attibute that i can parse and understand that allow > me to calulate the what is and is not compatible that is likely going to > more useful as you wont have to keep maintianing a list of other compatible > devices every time a new sku is released. > > in anycase thank for actully shareing ^ as it make it simpler to reson about what > you have previously proposed. So, what would be the most helpful format? A 'software_version' field that follows the conventions outlined above, and other (possibly optional) fields that have to match? (...) > > Thanks for the explanation, I'm still fuzzy about the details. > > Anyway, I suggest you to check "devlink dev info" command we have > > implemented for multiple drivers. > > is devlink exposed as a filesytem we can read with just open? > openstack will likely try to leverage libvirt to get this info but when we > cant its much simpler to read sysfs then it is to take a a depenency on a commandline > too and have to fork shell to execute it and parse the cli output. > pyroute2 which we use in some openstack poject has basic python binding for devlink but im not > sure how complete it is as i think its relitivly new addtion. if we need to take a dependcy > we will but that would be a drawback fo devlink not that that is a large one just something > to keep in mind. A devlinkfs, maybe? At least for reading information (IIUC, "devlink dev info" is only about information retrieval, right?) From sean.mcginnis at gmx.com Fri Aug 7 21:09:00 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 7 Aug 2020 16:09:00 -0500 Subject: [sigs][vendors] Proposal to create Hardware Vendor SIG Message-ID: <5d4928c2-8e14-82a7-c06b-dd8df4de44fb@gmx.com> Hey everyone, OpenStack is a community for creating open source software, but by the nature of infrastructure management, there is a strong intersection with hardware, and therefore hardware vendors. Nova, Cinder, Neutron, Ironic, and many others need to interact with hardware and support vendor specific drivers for this interaction. There is currently a spectrum where some of this hardware interaction is done as openly as possible, while others develop their integration "glue" behind closed doors. Part of this is the disconnect between the fully open development of OpenStack, and the traditionally proprietary development of many products. For those that do want to do their vendor-specific development as openly as possible - and hopefully attract the involvement of customers, partners, and others in the community - there hasn't been a great venue for this so far. Some vendors that have done open development have even had difficulty finding a place in the community and ended up deciding to just develop behind closed doors. I would like to try to change this, so I am proposing the creation of the Hardware Vendor SIG as a place where we can collaborate on vendor specific things, and encourage development to happen in the open. This would be a place for any vendors and interested parties to work together to fix bugs, implement features, and overall improve the quality of anything that helps provide that glue to bridge the gap between our open source services and vendor hardware. This would include servers, storage, networking, and really anything else that plays a role in being able to set up an OpenStack cloud. This is a call out to any others interested in participating. If you are interested in this effort, and if you have any existing code (whether hosted on OpenDev, hosted on GitHub, or hosted on your own platform) that you think would be a good fit for this, please add your contact information and any relevant details here: https://etherpad.opendev.org/p/HardwareVendorSIG Also, please feel free to show your support by voting on the proposal to create this SIG here: https://review.opendev.org/#/c/745185/ Thanks! Sean From zigo at debian.org Fri Aug 7 21:19:45 2020 From: zigo at debian.org (Thomas Goirand) Date: Fri, 7 Aug 2020 23:19:45 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > Thanks, Pierre for helping with this. > > ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) > but I am not sure if he got any response back. The end of the very good maintenance of Cloudkitty matched the date when objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? This is very disappointing as I've been using it for some time already, and that I was satisfied by it (ie: it does the job...), and especially that latest releases are able to scale correctly. I very much would love if Pierre Riteau was successful in taking over. Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. Cheers, Thomas Goirand (zigo) From Arkady.Kanevsky at dell.com Sat Aug 8 02:48:01 2020 From: Arkady.Kanevsky at dell.com (Kanevsky, Arkady) Date: Sat, 8 Aug 2020 02:48:01 +0000 Subject: [sigs][vendors] Proposal to create Hardware Vendor SIG In-Reply-To: <5d4928c2-8e14-82a7-c06b-dd8df4de44fb@gmx.com> References: <5d4928c2-8e14-82a7-c06b-dd8df4de44fb@gmx.com> Message-ID: Great idea. Long time overdue. Great place for many out-of-tree repos. Thanks Arkady -----Original Message----- From: Sean McGinnis Sent: Friday, August 7, 2020 4:09 PM To: openstack-discuss Subject: [sigs][vendors] Proposal to create Hardware Vendor SIG [EXTERNAL EMAIL] Hey everyone, OpenStack is a community for creating open source software, but by the nature of infrastructure management, there is a strong intersection with hardware, and therefore hardware vendors. Nova, Cinder, Neutron, Ironic, and many others need to interact with hardware and support vendor specific drivers for this interaction. There is currently a spectrum where some of this hardware interaction is done as openly as possible, while others develop their integration "glue" behind closed doors. Part of this is the disconnect between the fully open development of OpenStack, and the traditionally proprietary development of many products. For those that do want to do their vendor-specific development as openly as possible - and hopefully attract the involvement of customers, partners, and others in the community - there hasn't been a great venue for this so far. Some vendors that have done open development have even had difficulty finding a place in the community and ended up deciding to just develop behind closed doors. I would like to try to change this, so I am proposing the creation of the Hardware Vendor SIG as a place where we can collaborate on vendor specific things, and encourage development to happen in the open. This would be a place for any vendors and interested parties to work together to fix bugs, implement features, and overall improve the quality of anything that helps provide that glue to bridge the gap between our open source services and vendor hardware. This would include servers, storage, networking, and really anything else that plays a role in being able to set up an OpenStack cloud. This is a call out to any others interested in participating. If you are interested in this effort, and if you have any existing code (whether hosted on OpenDev, hosted on GitHub, or hosted on your own platform) that you think would be a good fit for this, please add your contact information and any relevant details here: https://etherpad.opendev.org/p/HardwareVendorSIG Also, please feel free to show your support by voting on the proposal to create this SIG here: https://review.opendev.org/#/c/745185/ Thanks! Sean From dev.faz at gmail.com Sat Aug 8 04:30:09 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Sat, 8 Aug 2020 06:30:09 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: <20200806144016.GP31915@sync> References: <20200806144016.GP31915@sync> Message-ID: Hi, we also have this issue. Our solution was (up to now) to delete the queues with a script or even reset the complete cluster. We just upgraded rabbitmq to the latest version - without luck. Anyone else seeing this issue? Fabian Arnaud Morin schrieb am Do., 6. Aug. 2020, 16:47: > Hey all, > > I would like to ask the community about a rabbit issue we have from time > to time. > > In our current architecture, we have a cluster of rabbits (3 nodes) for > all our OpenStack services (mostly nova and neutron). > > When one node of this cluster is down, the cluster continue working (we > use pause_minority strategy). > But, sometimes, the third server is not able to recover automatically > and need a manual intervention. > After this intervention, we restart the rabbitmq-server process, which > is then able to join the cluster back. > > At this time, the cluster looks ok, everything is fine. > BUT, nothing works. > Neutron and nova agents are not able to report back to servers. > They appear dead. > Servers seems not being able to consume messages. > The exchanges, queues, bindings seems good in rabbit. > > What we see is that removing bindings (using rabbitmqadmin delete > binding or the web interface) and recreate them again (using the same > routing key) brings the service back up and running. > > Doing this for all queues is really painful. Our next plan is to > automate it, but is there anyone in the community already saw this kind > of issues? > > Our bug looks like the one described in [1]. > Someone recommands to create an Alternate Exchange. > Is there anyone already tried that? > > FYI, we are running rabbit 3.8.2 (with OpenStack Stein). > We had the same kind of issues using older version of rabbit. > > Thanks for your help. > > [1] > https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk > > -- > Arnaud Morin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhartendu at gmail.com Sat Aug 8 07:00:51 2020 From: bhartendu at gmail.com (Bhartendu) Date: Sat, 8 Aug 2020 12:30:51 +0530 Subject: Openstack VMmachine internet connectivity issue Message-ID: Hi All, I need help or any pointer to solve Openstack VM machine internet connectivity issue. I am successfully able to create VM machines on Openstack cloud. Facing following Openstack VM machine internet connectivity issues: 1) There is no internet connectivity from Openstack VM machine. 2) ping is successful till Openstack machine (192.168.0.166), but Gateway ip (192.168.0.1) not reachable from Openstack VM machine. 3) No website reachable from Openstack VM machine. Openstack is installed on Virtualbox VM machine (CentOS 8.2). Connectivity information: Internet<=====>Router(192.168.0.1)<=====>Oracle VM machine (CentOS 8.2; 192.168.0.166)<=====>Openstack VM Machine (192.168.0.174) Any help or trigger is much appreciated. ----------------------------------------------- Openstack VM Machine (192.168.0.174) Logs ----------------------------------------------- $ ip route default via 192.168.0.1 dev eth0 169.254.169.254 via 192.168.0.171 dev eth0 192.168.0.0/24 dev eth0 scope link src 192.168.0.174 $ ping 192.168.0.166 PING 192.168.0.166 (192.168.0.166): 56 data bytes 64 bytes from 192.168.0.166: seq=0 ttl=64 time=2.657 ms 64 bytes from 192.168.0.166: seq=1 ttl=64 time=1.196 ms 64 bytes from 192.168.0.166: seq=2 ttl=64 time=1.312 ms 64 bytes from 192.168.0.166: seq=3 ttl=64 time=0.875 ms 64 bytes from 192.168.0.166: seq=4 ttl=64 time=0.782 ms ^C --- 192.168.0.166 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max = 0.782/1.364/2.657 ms $ ping 192.168.0.1 PING 192.168.0.1 (192.168.0.1): 56 data bytes ^C --- 192.168.0.1 ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss $ ping google.com ping: bad address 'google.com' $ $ sudo cat /etc/resolv.conf nameserver 192.168.0.1 nameserver 192.168.0.166 nameserver 8.8.8.8 $ $ ip route default via 192.168.0.1 dev eth0 169.254.169.254 via 192.168.0.171 dev eth0 192.168.0.0/24 dev eth0 scope link src 192.168.0.174 $ $ ifconfig eth0 Link encap:Ethernet HWaddr FA:16:3E:17:F2:F9 inet addr:192.168.0.174 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe17:f2f9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:572 errors:0 dropped:0 overruns:0 frame:0 TX packets:571 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:51697 (50.4 KiB) TX bytes:46506 (45.4 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:35 errors:0 dropped:0 overruns:0 frame:0 TX packets:35 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3764 (3.6 KiB) TX bytes:3764 (3.6 KiB) $ ----------------------------------------------------- Oracle VM machine (CentOS 8.2; 192.168.0.166) Logs ----------------------------------------------------- [root at openstack ~]# ovs-vsctl show 6718c4ce-6a58-463c-95e4-20a34edbe041 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port "tap50a66e3f-00" Interface "tap50a66e3f-00" Port "patch-br-int-to-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1" Interface "patch-br-int-to-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1" type: patch options: {peer="patch-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1-to-br-int"} Port "tap6ea42aaf-80" Interface "tap6ea42aaf-80" Port "tap743cdf36-c8" Interface "tap743cdf36-c8" Port "tap2a93518d-90" Interface "tap2a93518d-90" Bridge br-ex fail_mode: standalone Port "patch-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1-to-br-int" Interface "patch-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1-to-br-int" type: patch options: {peer="patch-br-int-to-provnet-27804412-47c0-497d-8f2e-8c0bf8b04df1"} Port br-ex Interface br-ex type: internal Port "enp0s3" Interface "enp0s3" ovs_version: "2.12.0" [root at openstack ~]# ip a s enp0s3 2: enp0s3: mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether 08:00:27:cd:fc:4f brd ff:ff:ff:ff:ff:ff [root at openstack ~]# ip a s br-ex 13: br-ex: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 08:00:27:cd:fc:4f brd ff:ff:ff:ff:ff:ff inet 192.168.0.166/24 brd 192.168.0.255 scope global br-ex valid_lft forever preferred_lft forever inet6 fe80::f057:6bff:fe69:1f47/64 scope link valid_lft forever preferred_lft forever [root at openstack ~]# [root at openstack ~]# cat /etc/resolv.conf # Generated by NetworkManager search example.com nameserver 192.168.0.1 nameserver 8.8.8.8 [root at openstack ~]# [root at openstack ~]# ip route default via 192.168.0.1 dev br-ex 169.254.0.0/16 dev br-ex scope link metric 1013 192.168.0.0/24 dev br-ex proto kernel scope link src 192.168.0.166 [root at openstack ~]# [root at openstack ~]# ifconfig br-ex: flags=4163 mtu 1500 inet 192.168.0.166 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::f057:6bff:fe69:1f47 prefixlen 64 scopeid 0x20 ether 08:00:27:cd:fc:4f txqueuelen 1000 (Ethernet) RX packets 2781 bytes 505203 (493.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1441 bytes 159474 (155.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp0s3: flags=4163 mtu 1500 ether 08:00:27:cd:fc:4f txqueuelen 1000 (Ethernet) RX packets 94856 bytes 111711956 (106.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 33859 bytes 15114156 (14.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 1741807 bytes 441610166 (421.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1741807 bytes 441610166 (421.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tap2a93518d-90: flags=4163 mtu 1500 inet6 fe80::681d:e6ff:fe1d:963f prefixlen 64 scopeid 0x20 ether 6a:1d:e6:1d:96:3f txqueuelen 1000 (Ethernet) RX packets 55 bytes 6540 (6.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2950 bytes 916693 (895.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tap50a66e3f-00: flags=4163 mtu 1442 ether fe:16:3e:06:df:bf txqueuelen 1000 (Ethernet) RX packets 689 bytes 59074 (57.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 703 bytes 58933 (57.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tap6ea42aaf-80: flags=4163 mtu 1500 inet6 fe80::745b:6aff:fe1c:d270 prefixlen 64 scopeid 0x20 ether 76:5b:6a:1c:d2:70 txqueuelen 1000 (Ethernet) RX packets 63 bytes 7332 (7.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 126 bytes 10692 (10.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tap743cdf36-c8: flags=4163 mtu 1500 ether fe:16:3e:17:f2:f9 txqueuelen 1000 (Ethernet) RX packets 1005 bytes 91770 (89.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3555 bytes 902987 (881.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root at openstack ~]# Thanks & Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Sat Aug 8 07:36:40 2020 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Sat, 8 Aug 2020 09:36:40 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> Message-ID: We also see the issue. When it happens stopping and restarting the rabbit cluster usually helps. I thought the problem was because of a wrong setting in the openstack services conf files: I missed these settings (that I am now going to add): [oslo_messaging_rabbit] rabbit_ha_queues = true amqp_durable_queues = true Cheers, Massimo On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann wrote: > Hi, > > we also have this issue. > > Our solution was (up to now) to delete the queues with a script or even > reset the complete cluster. > > We just upgraded rabbitmq to the latest version - without luck. > > Anyone else seeing this issue? > > Fabian > > > > Arnaud Morin schrieb am Do., 6. Aug. 2020, 16:47: > >> Hey all, >> >> I would like to ask the community about a rabbit issue we have from time >> to time. >> >> In our current architecture, we have a cluster of rabbits (3 nodes) for >> all our OpenStack services (mostly nova and neutron). >> >> When one node of this cluster is down, the cluster continue working (we >> use pause_minority strategy). >> But, sometimes, the third server is not able to recover automatically >> and need a manual intervention. >> After this intervention, we restart the rabbitmq-server process, which >> is then able to join the cluster back. >> >> At this time, the cluster looks ok, everything is fine. >> BUT, nothing works. >> Neutron and nova agents are not able to report back to servers. >> They appear dead. >> Servers seems not being able to consume messages. >> The exchanges, queues, bindings seems good in rabbit. >> >> What we see is that removing bindings (using rabbitmqadmin delete >> binding or the web interface) and recreate them again (using the same >> routing key) brings the service back up and running. >> >> Doing this for all queues is really painful. Our next plan is to >> automate it, but is there anyone in the community already saw this kind >> of issues? >> >> Our bug looks like the one described in [1]. >> Someone recommands to create an Alternate Exchange. >> Is there anyone already tried that? >> >> FYI, we are running rabbit 3.8.2 (with OpenStack Stein). >> We had the same kind of issues using older version of rabbit. >> >> Thanks for your help. >> >> [1] >> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk >> >> -- >> Arnaud Morin >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Sat Aug 8 13:06:36 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Sat, 8 Aug 2020 15:06:36 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> Message-ID: Hi, dont know if durable queues help, but should be enabled by rabbitmq policy which (alone) doesnt seem to fix this (we have this active) Fabian Massimo Sgaravatto schrieb am Sa., 8. Aug. 2020, 09:36: > We also see the issue. When it happens stopping and restarting the rabbit > cluster usually helps. > > I thought the problem was because of a wrong setting in the openstack > services conf files: I missed these settings (that I am now going to add): > > [oslo_messaging_rabbit] > rabbit_ha_queues = true > amqp_durable_queues = true > > Cheers, Massimo > > > On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann > wrote: > >> Hi, >> >> we also have this issue. >> >> Our solution was (up to now) to delete the queues with a script or even >> reset the complete cluster. >> >> We just upgraded rabbitmq to the latest version - without luck. >> >> Anyone else seeing this issue? >> >> Fabian >> >> >> >> Arnaud Morin schrieb am Do., 6. Aug. 2020, >> 16:47: >> >>> Hey all, >>> >>> I would like to ask the community about a rabbit issue we have from time >>> to time. >>> >>> In our current architecture, we have a cluster of rabbits (3 nodes) for >>> all our OpenStack services (mostly nova and neutron). >>> >>> When one node of this cluster is down, the cluster continue working (we >>> use pause_minority strategy). >>> But, sometimes, the third server is not able to recover automatically >>> and need a manual intervention. >>> After this intervention, we restart the rabbitmq-server process, which >>> is then able to join the cluster back. >>> >>> At this time, the cluster looks ok, everything is fine. >>> BUT, nothing works. >>> Neutron and nova agents are not able to report back to servers. >>> They appear dead. >>> Servers seems not being able to consume messages. >>> The exchanges, queues, bindings seems good in rabbit. >>> >>> What we see is that removing bindings (using rabbitmqadmin delete >>> binding or the web interface) and recreate them again (using the same >>> routing key) brings the service back up and running. >>> >>> Doing this for all queues is really painful. Our next plan is to >>> automate it, but is there anyone in the community already saw this kind >>> of issues? >>> >>> Our bug looks like the one described in [1]. >>> Someone recommands to create an Alternate Exchange. >>> Is there anyone already tried that? >>> >>> FYI, we are running rabbit 3.8.2 (with OpenStack Stein). >>> We had the same kind of issues using older version of rabbit. >>> >>> Thanks for your help. >>> >>> [1] >>> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk >>> >>> -- >>> Arnaud Morin >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwm2012 at gmail.com Sun Aug 9 15:54:47 2020 From: pwm2012 at gmail.com (pwm) Date: Sun, 9 Aug 2020 23:54:47 +0800 Subject: DNS server instead of /etc/hosts file on Infra Server Message-ID: Hi, Anyone interested in replacing the /etc/hosts file entry with a DNS server on the openstack-ansible deployment? Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Sun Aug 9 11:59:20 2020 From: monika.samal at outlook.com (Monika Samal) Date: Sun, 9 Aug 2020 11:59:20 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , , Message-ID: ________________________________ From: Monika Samal Sent: Friday, August 7, 2020 4:41:52 AM To: Mark Goddard ; Michael Johnson Cc: Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer I tried following above document still facing same Octavia connection error with amphora image. Regards, Monika ________________________________ From: Mark Goddard Sent: Thursday, August 6, 2020 1:16:01 PM To: Michael Johnson Cc: Monika Samal ; Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Sun Aug 9 12:02:01 2020 From: monika.samal at outlook.com (Monika Samal) Date: Sun, 9 Aug 2020 12:02:01 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , , , Message-ID: Hi All, Below is the error am getting, i tried configuring network issue as well still finding it difficult to resolve. Below is my log...if somebody can help me resolving it..it would be great help since its very urgent... http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ Regards, Monika ________________________________ From: Monika Samal Sent: Sunday, 9 August, 2020, 5:29 pm To: Mark Goddard; Michael Johnson; openstack-discuss Cc: Fabian Zimmermann Subject: Re: [openstack-community] Octavia :; Unable to create load balancer ________________________________ From: Monika Samal Sent: Friday, August 7, 2020 4:41:52 AM To: Mark Goddard ; Michael Johnson Cc: Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer I tried following above document still facing same Octavia connection error with amphora image. Regards, Monika ________________________________ From: Mark Goddard Sent: Thursday, August 6, 2020 1:16:01 PM To: Michael Johnson Cc: Monika Samal ; Fabian Zimmermann ; openstack-discuss Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Mon Aug 10 05:44:29 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Sun, 9 Aug 2020 22:44:29 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: That looks like there is still a kolla networking issue where the amphora are not able to reach the controller processes. Please fix the lb-mgmt-net such that it can reach the amphora and the controller containers. This should be setup via the deployment tool, kolla in this case. Michael On Sun, Aug 9, 2020 at 5:02 AM Monika Samal wrote: > Hi All, > > Below is the error am getting, i tried configuring network issue as well > still finding it difficult to resolve. > > Below is my log...if somebody can help me resolving it..it would be great > help since its very urgent... > > http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ > > Regards, > Monika > ------------------------------ > *From:* Monika Samal > *Sent:* Sunday, 9 August, 2020, 5:29 pm > *To:* Mark Goddard; Michael Johnson; openstack-discuss > *Cc:* Fabian Zimmermann > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > ------------------------------ > *From:* Monika Samal > *Sent:* Friday, August 7, 2020 4:41:52 AM > *To:* Mark Goddard ; Michael Johnson < > johnsomor at gmail.com> > *Cc:* Fabian Zimmermann ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > I tried following above document still facing same Octavia connection > error with amphora image. > > Regards, > Monika > ------------------------------ > *From:* Mark Goddard > *Sent:* Thursday, August 6, 2020 1:16:01 PM > *To:* Michael Johnson > *Cc:* Monika Samal ; Fabian Zimmermann < > dev.faz at gmail.com>; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > On Wed, 5 Aug 2020 at 16:16, Michael Johnson wrote: > > Looking at that error, it appears that the lb-mgmt-net is not setup > correctly. The Octavia controller containers are not able to reach the > amphora instances on the lb-mgmt-net subnet. > > I don't know how kolla is setup to connect the containers to the neutron > lb-mgmt-net network. Maybe the above documents will help with that. > > > Right now it's up to the operator to configure that. The kolla > documentation doesn't prescribe any particular setup. We're working on > automating it in Victoria. > > > Michael > > On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: > > > > On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: > > Hello Guys, > > With Michaels help I was able to solve the problem but now there is > another error I was able to create my network on vlan but still error > persist. PFB the logs: > > http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ > > Kindly help > > regards, > Monika > ------------------------------ > *From:* Michael Johnson > *Sent:* Monday, August 3, 2020 9:10 PM > *To:* Fabian Zimmermann > *Cc:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Yeah, it looks like nova is failing to boot the instance. > > Check this setting in your octavia.conf files: > https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id > > Also, if kolla-ansible didn't set both of these values correctly, please > open bug reports for kolla-ansible. These all should have been configured > by the deployment tool. > > > I wasn't following this thread due to no [kolla] tag, but here are the > recently added docs for Octavia in kolla [1]. Note > the octavia_service_auth_project variable which was added to migrate from > the admin project to the service project for octavia resources. We're > lacking proper automation for the flavor, image etc, but it is being worked > on in Victoria [2]. > > [1] > https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html > [2] https://review.opendev.org/740180 > > Michael > > On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: > > Seems like the flavor is missing or empty '' - check for typos and enable > debug. > > Check if the nova req contains valid information/flavor. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 15:46: > > It's registered > > Get Outlook for Android > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Monday, August 3, 2020 7:08:21 PM > *To:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Did you check the (nova) flavor you use in octavia. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 10:53: > > After Michael suggestion I was able to create load balancer but there is > error in status. > > > > PFB the error link: > > http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ > ------------------------------ > *From:* Monika Samal > *Sent:* Monday, August 3, 2020 2:08 PM > *To:* Michael Johnson > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Thanks a ton Michael for helping me out > ------------------------------ > *From:* Michael Johnson > *Sent:* Friday, July 31, 2020 3:57 AM > *To:* Monika Samal > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Just to close the loop on this, the octavia.conf file had > "project_name = admin" instead of "project_name = service" in the > [service_auth] section. This was causing the keystone errors when > Octavia was communicating with neutron. > > I don't know if that is a bug in kolla-ansible or was just a local > configuration issue. > > Michael > > On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > > > Hello Fabian,, > > > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > > > Regards, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > Hi, > > > > just to debug, could you replace the auth_type password with v3password? > > > > And do a curl against your :5000 and :35357 urls and paste the output. > > > > Fabian > > > > Monika Samal schrieb am Do., 30. Juli 2020, > 22:15: > > > > Hello Fabian, > > > > http://paste.openstack.org/show/796477/ > > > > Thanks, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > The sections should be > > > > service_auth > > keystone_authtoken > > > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > > > Fabian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Mon Aug 10 05:49:36 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 10 Aug 2020 07:49:36 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Hi, to test your connection you can create an instance im the octavia network and try to ping/ssh from your controller (dont forget a suitable security group) Fabian Michael Johnson schrieb am Mo., 10. Aug. 2020, 07:44: > > That looks like there is still a kolla networking issue where the amphora > are not able to reach the controller processes. Please fix the lb-mgmt-net > such that it can reach the amphora and the controller containers. This > should be setup via the deployment tool, kolla in this case. > > Michael > > On Sun, Aug 9, 2020 at 5:02 AM Monika Samal > wrote: > >> Hi All, >> >> Below is the error am getting, i tried configuring network issue as well >> still finding it difficult to resolve. >> >> Below is my log...if somebody can help me resolving it..it would be great >> help since its very urgent... >> >> http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ >> >> Regards, >> Monika >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Sunday, 9 August, 2020, 5:29 pm >> *To:* Mark Goddard; Michael Johnson; openstack-discuss >> *Cc:* Fabian Zimmermann >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Friday, August 7, 2020 4:41:52 AM >> *To:* Mark Goddard ; Michael Johnson < >> johnsomor at gmail.com> >> *Cc:* Fabian Zimmermann ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> I tried following above document still facing same Octavia connection >> error with amphora image. >> >> Regards, >> Monika >> ------------------------------ >> *From:* Mark Goddard >> *Sent:* Thursday, August 6, 2020 1:16:01 PM >> *To:* Michael Johnson >> *Cc:* Monika Samal ; Fabian Zimmermann < >> dev.faz at gmail.com>; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> >> >> On Wed, 5 Aug 2020 at 16:16, Michael Johnson wrote: >> >> Looking at that error, it appears that the lb-mgmt-net is not setup >> correctly. The Octavia controller containers are not able to reach the >> amphora instances on the lb-mgmt-net subnet. >> >> I don't know how kolla is setup to connect the containers to the neutron >> lb-mgmt-net network. Maybe the above documents will help with that. >> >> >> Right now it's up to the operator to configure that. The kolla >> documentation doesn't prescribe any particular setup. We're working on >> automating it in Victoria. >> >> >> Michael >> >> On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: >> >> >> >> On Tue, 4 Aug 2020 at 16:58, Monika Samal >> wrote: >> >> Hello Guys, >> >> With Michaels help I was able to solve the problem but now there is >> another error I was able to create my network on vlan but still error >> persist. PFB the logs: >> >> http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ >> >> Kindly help >> >> regards, >> Monika >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Monday, August 3, 2020 9:10 PM >> *To:* Fabian Zimmermann >> *Cc:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Yeah, it looks like nova is failing to boot the instance. >> >> Check this setting in your octavia.conf files: >> https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id >> >> Also, if kolla-ansible didn't set both of these values correctly, please >> open bug reports for kolla-ansible. These all should have been configured >> by the deployment tool. >> >> >> I wasn't following this thread due to no [kolla] tag, but here are the >> recently added docs for Octavia in kolla [1]. Note >> the octavia_service_auth_project variable which was added to migrate from >> the admin project to the service project for octavia resources. We're >> lacking proper automation for the flavor, image etc, but it is being worked >> on in Victoria [2]. >> >> [1] >> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html >> [2] https://review.opendev.org/740180 >> >> Michael >> >> On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann >> wrote: >> >> Seems like the flavor is missing or empty '' - check for typos and enable >> debug. >> >> Check if the nova req contains valid information/flavor. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 15:46: >> >> It's registered >> >> Get Outlook for Android >> ------------------------------ >> *From:* Fabian Zimmermann >> *Sent:* Monday, August 3, 2020 7:08:21 PM >> *To:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Did you check the (nova) flavor you use in octavia. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 10:53: >> >> After Michael suggestion I was able to create load balancer but there is >> error in status. >> >> >> >> PFB the error link: >> >> http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Monday, August 3, 2020 2:08 PM >> *To:* Michael Johnson >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Thanks a ton Michael for helping me out >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Friday, July 31, 2020 3:57 AM >> *To:* Monika Samal >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Just to close the loop on this, the octavia.conf file had >> "project_name = admin" instead of "project_name = service" in the >> [service_auth] section. This was causing the keystone errors when >> Octavia was communicating with neutron. >> >> I don't know if that is a bug in kolla-ansible or was just a local >> configuration issue. >> >> Michael >> >> On Thu, Jul 30, 2020 at 1:39 PM Monika Samal >> wrote: >> > >> > Hello Fabian,, >> > >> > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ >> > >> > Regards, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:57 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > Hi, >> > >> > just to debug, could you replace the auth_type password with v3password? >> > >> > And do a curl against your :5000 and :35357 urls and paste the output. >> > >> > Fabian >> > >> > Monika Samal schrieb am Do., 30. Juli 2020, >> 22:15: >> > >> > Hello Fabian, >> > >> > http://paste.openstack.org/show/796477/ >> > >> > Thanks, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:38 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > The sections should be >> > >> > service_auth >> > keystone_authtoken >> > >> > if i read the docs correctly. Maybe you can just paste your config >> (remove/change passwords) to paste.openstack.org and post the link? >> > >> > Fabian >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From marino.mrc at gmail.com Mon Aug 10 07:49:28 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Mon, 10 Aug 2020 09:49:28 +0200 Subject: [ironic][tripleo][ussuri] Problem with bare metal provisioning and old RAID controllers In-Reply-To: References: Message-ID: Hi, I'm sorry if I reopen this thread, but I cannot find a solution at the moment. Please, can someone give me some hint on how to detect megaraid controllers with IPA? I think this could be useful for many users. PS: I can do any test, I have a 6 servers test environment (5 nodes + undercloud) with megaraid controllers (Poweredge R620) Thank you Il giorno mar 4 ago 2020 alle ore 12:57 Marco Marino ha scritto: > Here is what I did: > # /usr/lib/dracut/skipcpio > /home/stack/images/ironic-python-agent.initramfs | zcat | cpio -ivd | pax -r > # mount dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso /mnt/ > # rpm2cpio > /mnt/rpms/x86_64/kmod-megaraid_sas-07.710.50.00-1.el8_2.elrepo.x86_64.rpm | > pax -r > # find . 2>/dev/null | cpio --quiet -c -o | gzip -8 > > /home/stack/images/ironic-python-agent.initramfs > # chown stack: /home/stack/images/ironic-python-agent.initramfs > (undercloud) [stack at undercloud ~]$ openstack overcloud image upload > --update-existing --image-path /home/stack/images/ > > At this point I checked that agent.ramdisk in /var/lib/ironic/httpboot has > an update timestamp > > Then > (undercloud) [stack at undercloud ~]$ openstack overcloud node introspect > --provide controller2 > /usr/lib64/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't > resolve package from __spec__ or __package__, falling back on __name__ and > __path__ > return f(*args, **kwds) > > PLAY [Baremetal Introspection for multiple Ironic Nodes] > *********************** > 2020-08-04 12:32:26.684368 | ecf4bbd2-e605-20dd-3da9-000000000008 | > TASK | Check for required inputs > 2020-08-04 12:32:26.739797 | ecf4bbd2-e605-20dd-3da9-000000000008 | > SKIPPED | Check for required inputs | localhost | item=node_uuids > 2020-08-04 12:32:26.746684 | ecf4bbd2-e605-20dd-3da9-00000000000a | > TASK | Set node_uuids_intro fact > [WARNING]: Failure using method (v2_playbook_on_task_start) in callback > plugin > ( 0x7f1b0f9bce80>): maximum recursion depth exceeded while calling a Python > object > 2020-08-04 12:32:26.828985 | ecf4bbd2-e605-20dd-3da9-00000000000a | > OK | Set node_uuids_intro fact | localhost > 2020-08-04 12:32:26.834281 | ecf4bbd2-e605-20dd-3da9-00000000000c | > TASK | Notice > 2020-08-04 12:32:26.911106 | ecf4bbd2-e605-20dd-3da9-00000000000c | > SKIPPED | Notice | localhost > 2020-08-04 12:32:26.916344 | ecf4bbd2-e605-20dd-3da9-00000000000e | > TASK | Set concurrency fact > 2020-08-04 12:32:26.994087 | ecf4bbd2-e605-20dd-3da9-00000000000e | > OK | Set concurrency fact | localhost > 2020-08-04 12:32:27.005932 | ecf4bbd2-e605-20dd-3da9-000000000010 | > TASK | Check if validation enabled > 2020-08-04 12:32:27.116425 | ecf4bbd2-e605-20dd-3da9-000000000010 | > SKIPPED | Check if validation enabled | localhost > 2020-08-04 12:32:27.129120 | ecf4bbd2-e605-20dd-3da9-000000000011 | > TASK | Run Validations > 2020-08-04 12:32:27.239850 | ecf4bbd2-e605-20dd-3da9-000000000011 | > SKIPPED | Run Validations | localhost > 2020-08-04 12:32:27.251796 | ecf4bbd2-e605-20dd-3da9-000000000012 | > TASK | Fail if validations are disabled > 2020-08-04 12:32:27.362050 | ecf4bbd2-e605-20dd-3da9-000000000012 | > SKIPPED | Fail if validations are disabled | localhost > 2020-08-04 12:32:27.373947 | ecf4bbd2-e605-20dd-3da9-000000000014 | > TASK | Start baremetal introspection > > > 2020-08-04 12:48:19.944028 | ecf4bbd2-e605-20dd-3da9-000000000014 | > CHANGED | Start baremetal introspection | localhost > 2020-08-04 12:48:19.966517 | ecf4bbd2-e605-20dd-3da9-000000000015 | > TASK | Nodes that passed introspection > 2020-08-04 12:48:20.130913 | ecf4bbd2-e605-20dd-3da9-000000000015 | > OK | Nodes that passed introspection | localhost | result={ > "changed": false, > "msg": " 00c5e81b-1e5d-442b-b64f-597a604051f7" > } > 2020-08-04 12:48:20.142919 | ecf4bbd2-e605-20dd-3da9-000000000016 | > TASK | Nodes that failed introspection > 2020-08-04 12:48:20.305004 | ecf4bbd2-e605-20dd-3da9-000000000016 | > OK | Nodes that failed introspection | localhost | result={ > "changed": false, > "failed_when_result": false, > "msg": " All nodes completed introspection successfully!" > } > 2020-08-04 12:48:20.316860 | ecf4bbd2-e605-20dd-3da9-000000000017 | > TASK | Node introspection failed and no results are provided > 2020-08-04 12:48:20.427675 | ecf4bbd2-e605-20dd-3da9-000000000017 | > SKIPPED | Node introspection failed and no results are provided | localhost > > PLAY RECAP > ********************************************************************* > localhost : ok=5 changed=1 unreachable=0 > failed=0 skipped=6 rescued=0 ignored=0 > [WARNING]: Failure using method (v2_playbook_on_stats) in callback plugin > ( 0x7f1b0f9bce80>): _output() missing 1 required positional argument: 'color' > Successfully introspected nodes: ['controller2'] > Exception occured while running the command > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", > line 340, in prepare_command > cmdline_args = self.loader.load_file('args', string_types, > encoding=None) > File "/usr/lib/python3.6/site-packages/ansible_runner/loader.py", line > 164, in load_file > contents = parsed_data = self.get_contents(path) > File "/usr/lib/python3.6/site-packages/ansible_runner/loader.py", line > 98, in get_contents > raise ConfigurationError('specified path does not exist %s' % path) > ansible_runner.exceptions.ConfigurationError: specified path does not > exist /tmp/tripleop89yr8i8/args > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line > 34, in run > super(Command, self).run(parsed_args) > File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line > 41, in run > return super(Command, self).run(parsed_args) > File "/usr/lib/python3.6/site-packages/cliff/command.py", line 187, in > run > return_code = self.take_action(parsed_args) or 0 > File > "/usr/lib/python3.6/site-packages/tripleoclient/v2/overcloud_node.py", line > 210, in take_action > node_uuids=parsed_args.node_uuids, > File > "/usr/lib/python3.6/site-packages/tripleoclient/workflows/baremetal.py", > line 134, in provide > 'node_uuids': node_uuids > File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line > 659, in run_ansible_playbook > runner_config.prepare() > File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", > line 174, in prepare > self.prepare_command() > File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", > line 346, in prepare_command > self.command = self.generate_ansible_command() > File "/usr/lib/python3.6/site-packages/ansible_runner/runner_config.py", > line 415, in generate_ansible_command > v = 'v' * self.verbosity > TypeError: can't multiply sequence by non-int of type 'ClientManager' > can't multiply sequence by non-int of type 'ClientManager' > (undercloud) [stack at undercloud ~]$ > > > and > (undercloud) [stack at undercloud ~]$ openstack baremetal node show > controller2 > .... > | properties | {'local_gb': '0', 'cpus': '24', 'cpu_arch': > 'x86_64', 'memory_mb': '32768', 'capabilities': > 'cpu_vt:true,cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,cpu_txt:true'} > > > It seems that megaraid driver is correctly inserted in ramdisk: > # lsinitrd /var/lib/ironic/httpboot/agent.ramdisk | grep megaraid > /bin/lsinitrd: line 276: warning: command substitution: ignored null byte > in input > -rw-r--r-- 1 root root 50 Apr 28 21:55 > etc/depmod.d/kmod-megaraid_sas.conf > drwxr-xr-x 2 root root 0 Aug 4 12:13 > usr/lib/modules/4.18.0-193.6.3.el8_2.x86_64/kernel/drivers/scsi/megaraid > -rw-r--r-- 1 root root 68240 Aug 4 12:13 > usr/lib/modules/4.18.0-193.6.3.el8_2.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko.xz > drwxr-xr-x 2 root root 0 Apr 28 21:55 > usr/lib/modules/4.18.0-193.el8.x86_64/extra/megaraid_sas > -rw-r--r-- 1 root root 309505 Apr 28 21:55 > usr/lib/modules/4.18.0-193.el8.x86_64/extra/megaraid_sas/megaraid_sas.ko > drwxr-xr-x 2 root root 0 Apr 28 21:55 > usr/share/doc/kmod-megaraid_sas-07.710.50.00 > -rw-r--r-- 1 root root 18092 Apr 28 21:55 > usr/share/doc/kmod-megaraid_sas-07.710.50.00/GPL-v2.0.txt > -rw-r--r-- 1 root root 1152 Apr 28 21:55 > usr/share/doc/kmod-megaraid_sas-07.710.50.00/greylist.txt > > If the solution is to use a Centos7 ramdisk, please can you give me some > hint? I have no idea on how to build a new ramdisk from scratch > Thank you > > > > > > > > > Il giorno mar 4 ago 2020 alle ore 12:33 Dmitry Tantsur < > dtantsur at redhat.com> ha scritto: > >> Hi, >> >> On Tue, Aug 4, 2020 at 11:58 AM Marco Marino >> wrote: >> >>> Hi, I'm trying to install openstack Ussuri on Centos 8 hardware using >>> tripleo. I'm using a relatively old hardware (dell PowerEdge R620) with old >>> RAID controllers, deprecated in RHEL8/Centos8. Here is some basic >>> information: >>> # lspci | grep -i raid >>> 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset >>> SATA RAID Controller (rev 05) >>> 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2008 [Falcon] >>> (rev 03) >>> >>> I'm able to manually install centos 8 using DUD driver from here -> >>> https://elrepo.org/linux/dud/el8/x86_64/dd-megaraid_sas-07.710.50.00-1.el8_2.elrepo.iso >>> (basically I add inst.dd and I use an usb pendrive with iso). >>> Is there a way to do bare metal provisioning using openstack on this >>> kind of server? At the moment, when I launch "openstack overcloud node >>> introspect --provide controller1" it doesn't recognize disks (local_gb = 0 >>> in properties) and in inspector logs I see: >>> Jun 22 11:12:42 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:42.261 1543 DEBUG root [-] Still waiting for the root >>> device to appear, attempt 1 of 10 wait_for_disks >>> /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:652 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.299 1543 DEBUG oslo_concurrency.processutils [-] >>> Running cmd (subprocess): udevadm settle execute >>> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.357 1543 DEBUG oslo_concurrency.processutils [-] CMD >>> "udevadm settle" returned: 0 in 0.058s execute >>> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.392 1543 DEBUG ironic_lib.utils [-] Execution >>> completed, command line is "udevadm settle" execute >>> /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.426 1543 DEBUG ironic_lib.utils [-] Command stdout is: >>> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.460 1543 DEBUG ironic_lib.utils [-] Command stderr is: >>> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.496 1543 WARNING root [-] Path /dev/disk/by-path is >>> inaccessible, /dev/disk/by-path/* version of block device name is >>> unavailable Cause: [Errno 2] No such file or directory: >>> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >>> directory: '/dev/disk/by-path' >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.549 1543 DEBUG oslo_concurrency.processutils [-] >>> Running cmd (subprocess): lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE execute >>> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.647 1543 DEBUG oslo_concurrency.processutils [-] CMD >>> "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" returned: 0 in 0.097s execute >>> /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.683 1543 DEBUG ironic_lib.utils [-] Execution >>> completed, command line is "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE" >>> execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.719 1543 DEBUG ironic_lib.utils [-] Command stdout is: >>> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 >>> Jun 22 11:12:45 localhost.localdomain ironic-python-agent[1543]: >>> 2018-06-22 11:12:45.755 1543 DEBUG ironic_lib.utils [-] Command stderr is: >>> "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 >>> >>> Is there a way to solve the issue? For example, can I modify ramdisk and >>> include DUD driver? I tried this guide: >>> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/partner_integration/overcloud_images#initrd_modifying_the_initial_ramdisks >>> >>> but I don't know how to include an ISO instead of an rpm packet as >>> described in the example. >>> >> >> Indeed, I don't think you can use ISO as it is, you'll need to figure out >> what is inside. If it's an RPM (as I assume), you'll need to extract it and >> install into the ramdisk. >> >> If nothing helps, you can try building a ramdisk with CentOS 7, the >> (very) recent versions of ironic-python-agent-builder allow using Python 3 >> on CentOS 7. >> >> Dmitry >> >> >>> Thank you, >>> Marco >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Aug 10 08:07:04 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 10 Aug 2020 09:07:04 +0100 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: On Mon, 10 Aug 2020 at 06:44, Michael Johnson wrote: > > That looks like there is still a kolla networking issue where the amphora > are not able to reach the controller processes. Please fix the lb-mgmt-net > such that it can reach the amphora and the controller containers. This > should be setup via the deployment tool, kolla in this case. > As mentioned before, Kolla doesn't currently do this - it is up to the user. We're improving the integration in the Victoria cycle. > Michael > > On Sun, Aug 9, 2020 at 5:02 AM Monika Samal > wrote: > >> Hi All, >> >> Below is the error am getting, i tried configuring network issue as well >> still finding it difficult to resolve. >> >> Below is my log...if somebody can help me resolving it..it would be great >> help since its very urgent... >> >> http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ >> >> Regards, >> Monika >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Sunday, 9 August, 2020, 5:29 pm >> *To:* Mark Goddard; Michael Johnson; openstack-discuss >> *Cc:* Fabian Zimmermann >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Friday, August 7, 2020 4:41:52 AM >> *To:* Mark Goddard ; Michael Johnson < >> johnsomor at gmail.com> >> *Cc:* Fabian Zimmermann ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> I tried following above document still facing same Octavia connection >> error with amphora image. >> >> Regards, >> Monika >> ------------------------------ >> *From:* Mark Goddard >> *Sent:* Thursday, August 6, 2020 1:16:01 PM >> *To:* Michael Johnson >> *Cc:* Monika Samal ; Fabian Zimmermann < >> dev.faz at gmail.com>; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> >> >> On Wed, 5 Aug 2020 at 16:16, Michael Johnson wrote: >> >> Looking at that error, it appears that the lb-mgmt-net is not setup >> correctly. The Octavia controller containers are not able to reach the >> amphora instances on the lb-mgmt-net subnet. >> >> I don't know how kolla is setup to connect the containers to the neutron >> lb-mgmt-net network. Maybe the above documents will help with that. >> >> >> Right now it's up to the operator to configure that. The kolla >> documentation doesn't prescribe any particular setup. We're working on >> automating it in Victoria. >> >> >> Michael >> >> On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: >> >> >> >> On Tue, 4 Aug 2020 at 16:58, Monika Samal >> wrote: >> >> Hello Guys, >> >> With Michaels help I was able to solve the problem but now there is >> another error I was able to create my network on vlan but still error >> persist. PFB the logs: >> >> http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ >> >> Kindly help >> >> regards, >> Monika >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Monday, August 3, 2020 9:10 PM >> *To:* Fabian Zimmermann >> *Cc:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Yeah, it looks like nova is failing to boot the instance. >> >> Check this setting in your octavia.conf files: >> https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id >> >> Also, if kolla-ansible didn't set both of these values correctly, please >> open bug reports for kolla-ansible. These all should have been configured >> by the deployment tool. >> >> >> I wasn't following this thread due to no [kolla] tag, but here are the >> recently added docs for Octavia in kolla [1]. Note >> the octavia_service_auth_project variable which was added to migrate from >> the admin project to the service project for octavia resources. We're >> lacking proper automation for the flavor, image etc, but it is being worked >> on in Victoria [2]. >> >> [1] >> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html >> [2] https://review.opendev.org/740180 >> >> Michael >> >> On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann >> wrote: >> >> Seems like the flavor is missing or empty '' - check for typos and enable >> debug. >> >> Check if the nova req contains valid information/flavor. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 15:46: >> >> It's registered >> >> Get Outlook for Android >> ------------------------------ >> *From:* Fabian Zimmermann >> *Sent:* Monday, August 3, 2020 7:08:21 PM >> *To:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Did you check the (nova) flavor you use in octavia. >> >> Fabian >> >> Monika Samal schrieb am Mo., 3. Aug. 2020, >> 10:53: >> >> After Michael suggestion I was able to create load balancer but there is >> error in status. >> >> >> >> PFB the error link: >> >> http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ >> ------------------------------ >> *From:* Monika Samal >> *Sent:* Monday, August 3, 2020 2:08 PM >> *To:* Michael Johnson >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Thanks a ton Michael for helping me out >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Friday, July 31, 2020 3:57 AM >> *To:* Monika Samal >> *Cc:* Fabian Zimmermann ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Just to close the loop on this, the octavia.conf file had >> "project_name = admin" instead of "project_name = service" in the >> [service_auth] section. This was causing the keystone errors when >> Octavia was communicating with neutron. >> >> I don't know if that is a bug in kolla-ansible or was just a local >> configuration issue. >> >> Michael >> >> On Thu, Jul 30, 2020 at 1:39 PM Monika Samal >> wrote: >> > >> > Hello Fabian,, >> > >> > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ >> > >> > Regards, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:57 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > Hi, >> > >> > just to debug, could you replace the auth_type password with v3password? >> > >> > And do a curl against your :5000 and :35357 urls and paste the output. >> > >> > Fabian >> > >> > Monika Samal schrieb am Do., 30. Juli 2020, >> 22:15: >> > >> > Hello Fabian, >> > >> > http://paste.openstack.org/show/796477/ >> > >> > Thanks, >> > Monika >> > ________________________________ >> > From: Fabian Zimmermann >> > Sent: Friday, July 31, 2020 1:38 AM >> > To: Monika Samal >> > Cc: Michael Johnson ; Amy Marrich ; >> openstack-discuss ; >> community at lists.openstack.org >> > Subject: Re: [openstack-community] Octavia :; Unable to create load >> balancer >> > >> > The sections should be >> > >> > service_auth >> > keystone_authtoken >> > >> > if i read the docs correctly. Maybe you can just paste your config >> (remove/change passwords) to paste.openstack.org and post the link? >> > >> > Fabian >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Mon Aug 10 08:13:24 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Mon, 10 Aug 2020 10:13:24 +0200 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients Message-ID: Hi, during the last PTG the TC discussed the problem of supporting different clients (OpenStack Client - OSC vs python-*clients) [1]. Currently, we don't have feature parity between the OSC and the python-*clients. Different OpenStack projects invest in different clients. This can be a huge problem for users/ops. Depending on the projects deployed in their infrastructures, they need to use different clients for different tasks. It's confusing because of the partial implementation in the OSC. There was also the proposal to enforce new functionality only in the SDK (and optionally the OSC) and not the project’s specific clients to stop increasing the disparity between the two. We would like to understand first the problems and missing pieces that projects are facing to move into OSC and help to overcome them. Let us know. Belmiro, on behalf of the TC [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015418.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From dikonoor at in.ibm.com Mon Aug 10 08:16:57 2020 From: dikonoor at in.ibm.com (Divya K Konoor) Date: Mon, 10 Aug 2020 13:46:57 +0530 Subject: [openstack-community] Keystone and DBNonExistent Errors In-Reply-To: References: Message-ID: Hi, I am using OpenStack Keystone Stein and run into the below error often where Keystone public process(listening to 5000) is running inside Apache httpd runs into the below. This problem is resolved with a restart of httpd service. Has anyone run into a similar issue ? This is seen soon after httpd is restarted and does not happen all the time. My environment has MariaDB backend. This problem is not limited to the assignment table and is seen across all other tables in Keystone. MariaDB service is functional and all the tables are in place. [Fri Aug 07 08:20:59.936087 2020] [:info] [pid 1420287] mod_wsgi (pid=1420287, process='keystone-public', application=''): Loading WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:20:59.936089 2020] [:info] [pid 1420288] mod_wsgi (pid=1420288, process='keystone-admin', application=''): Loading WSGI script '/usr/bin/keystone-wsgi-admin'. [Fri Aug 07 08:20:59.943431 2020] [ssl:info] [pid 1420290] [client 1.2.3.95:35762] AH01964: Connection to child 1 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:00.009317 2020] [ssl:info] [pid 1420291] [client 1.2.3.113:60132] AH01964: Connection to child 2 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.243594 2020] [ssl:info] [pid 1420289] [client 1.2.3.50:53996] AH01964: Connection to child 0 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.386329 2020] [ssl:info] [pid 1420293] [client x.x.x.x:38645] AH01964: Connection to child 4 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.824041 2020] [ssl:info] [pid 1420349] [client 1.2.3.101:42974] AH01964: Connection to child 5 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:02.949166 2020] [ssl:info] [pid 1420378] [client 1.2.3.50:54014] AH01964: Connection to child 9 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:02.949172 2020] [ssl:info] [pid 1420379] [client 1.2.3.80:46924] AH01964: Connection to child 10 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:03.286057 2020] [:info] [pid 1420287] mod_wsgi (pid=1420287): Create interpreter '1.2.3.50:5000|'. [Fri Aug 07 08:21:03.287286 2020] [:info] [pid 1420287] [remote 1.2.3.95:156] mod_wsgi (pid=1420287, process='keystone-public', application='1.2.3.50:5000|'): Loading WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:21:04.675059 2020] [ssl:info] [pid 1420436] [client 1.2.3.50:54032] AH01964: Connection to child 12 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:04.705975 2020] [ssl:info] [pid 1420437] [client 1.2.3.107:59554] AH01964: Connection to child 13 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:06.960940 2020] [ssl:info] [pid 1420438] [client 1.2.3.80:46970] AH01964: Connection to child 14 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:07.661670 2020] [ssl:info] [pid 1420349] [client 1.2.3.50:54124] AH01964: Connection to child 5 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:07.683383 2020] [ssl:info] [pid 1420292] [client x.x.x.x:30065] AH01964: Connection to child 3 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:08.442956 2020] [:error] [pid 1420287] [remote x.x.x.x:144] mod_wsgi (pid=1420287): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:21:08.443002 2020] [:error] [pid 1420287] [remote x.x.x.x:144] Traceback (most recent call last): [Fri Aug 07 08:21:08.443017 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 2309, in __call__ [Fri Aug 07 08:21:08.443509 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.wsgi_app(environ, start_response) [Fri Aug 07 08:21:08.443525 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/werkzeug/contrib/fixers.py", line 152, in __call__ [Fri Aug 07 08:21:08.443630 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.app(environ, start_response) [Fri Aug 07 08:21:08.443644 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.443746 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.443756 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.443773 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.443781 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ [Fri Aug 07 08:21:08.443844 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self.application) .. ... .... .. [Fri Aug 07 08:21:08.450055 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 1216, in creator [Fri Aug 07 08:21:08.450071 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return fn(*arg, **kw) [Fri Aug 07 08:21:08.450080 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 128, in get_roles_for_user_and_project [Fri Aug 07 08:21:08.450975 2020] [:error] [pid 1420287] [remote x.x.x.x:144] user_id=user_id, project_id=project_id, effective=True) [Fri Aug 07 08:21:08.450985 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 116, in wrapped [Fri Aug 07 08:21:08.451001 2020] [:error] [pid 1420287] [remote x.x.x.x:144] __ret_val = __f(*args, **kwargs) [Fri Aug 07 08:21:08.451009 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 999, in list_role_assignments [Fri Aug 07 08:21:08.451025 2020] [:error] [pid 1420287] [remote x.x.x.x:144] strip_domain_roles) [Fri Aug 07 08:21:08.451033 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 845, in _list_effective_role_assignments [Fri Aug 07 08:21:08.451049 2020] [:error] [pid 1420287] [remote x.x.x.x:144] domain_id=domain_id, inherited=inherited) [Fri Aug 07 08:21:08.451057 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 780, in list_role_assignments_for_actor [Fri Aug 07 08:21:08.451072 2020] [:error] [pid 1420287] [remote x.x.x.x:144] group_ids=group_ids, inherited_to_projects=False) [Fri Aug 07 08:21:08.451081 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/backends/sql.py", line 248, in list_role_assignments [Fri Aug 07 08:21:08.451599 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return [denormalize_role(ref) for ref in query.all()] [Fri Aug 07 08:21:08.451609 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, in all [Fri Aug 07 08:21:08.451632 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return list(self) [Fri Aug 07 08:21:08.451641 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3081, in __iter__ [Fri Aug 07 08:21:08.451656 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self._execute_and_instances(context) [Fri Aug 07 08:21:08.451665 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3106, in _execute_and_instances [Fri Aug 07 08:21:08.451683 2020] [:error] [pid 1420287] [remote x.x.x.x:144] result = conn.execute(querycontext.statement, self._params) [Fri Aug 07 08:21:08.451691 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 980, in execute [Fri Aug 07 08:21:08.451711 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return meth(self, multiparams, params) [Fri Aug 07 08:21:08.451720 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection [Fri Aug 07 08:21:08.451736 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return connection._execute_clauseelement(self, multiparams, params) [Fri Aug 07 08:21:08.451745 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement [Fri Aug 07 08:21:08.451762 2020] [:error] [pid 1420287] [remote x.x.x.x:144] distilled_params, [Fri Aug 07 08:21:08.451771 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context [Fri Aug 07 08:21:08.451786 2020] [:error] [pid 1420287] [remote x.x.x.x:144] e, statement, parameters, cursor, context [Fri Aug 07 08:21:08.451795 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1456, in _handle_dbapi_exception [Fri Aug 07 08:21:08.451810 2020] [:error] [pid 1420287] [remote x.x.x.x:144] util.raise_from_cause(newraise, exc_info) [Fri Aug 07 08:21:08.451818 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause [Fri Aug 07 08:21:08.451834 2020] [:error] [pid 1420287] [remote x.x.x.x:144] reraise(type(exception), exception, tb=exc_tb, cause=cause) [Fri Aug 07 08:21:08.451843 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context [Fri Aug 07 08:21:08.451858 2020] [:error] [pid 1420287] [remote x.x.x.x:144] cursor, statement, parameters, context [Fri Aug 07 08:21:08.451866 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute [Fri Aug 07 08:21:08.451882 2020] [:error] [pid 1420287] [remote x.x.x.x:144] cursor.execute(statement, parameters) [Fri Aug 07 08:21:08.451923 2020] [:error] [pid 1420287] [remote x.x.x.x:144] DBNonExistentTable: (sqlite3.OperationalError) no such table: assignment [SQL: u'SELECT assignment.type AS assignment_type, assignment.actor_id AS assignment_actor_id, assignment.target_id AS assignment_target_id, assignment.role_id AS assignment_role_id, assignment.inherited AS assignment_inherited \\nFROM assignment \\nWHERE assignment.actor_id IN (?) AND assignment.target_id IN (?) AND assignment.type IN (?) AND assignment.inherited = 0'] [parameters: ('15c2fe91e053af57a997c568c117c908d59c138f996bdc19ae97e9f16df12345', '12345978536e45ab8a279e2b0fa4f947', 'UserProject')] (Background on this error at: http://sqlalche.me/e/e3q8) Regards, Divya -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Mon Aug 10 08:26:24 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 10 Aug 2020 10:26:24 +0200 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: Message-ID: On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > during the last PTG the TC discussed the problem of supporting different > clients (OpenStack Client - OSC vs python-*clients) [1]. > Currently, we don't have feature parity between the OSC and the > python-*clients. > Is it true of any client? I guess some are just OSC plugins 100%. Do we know which clients have this disparity? Personally, I encountered this with Glance the most and Cinder to some extent (but I believe over the course of action Cinder got all features I wanted from it in the OSC). -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Mon Aug 10 08:37:22 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Mon, 10 Aug 2020 10:37:22 +0200 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: Message-ID: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> On Monday, 10 August 2020 10:26:24 CEST Radosław Piliszek wrote: > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < > > moreira.belmiro.email.lists at gmail.com> wrote: > > Hi, > > during the last PTG the TC discussed the problem of supporting different > > clients (OpenStack Client - OSC vs python-*clients) [1]. > > Currently, we don't have feature parity between the OSC and the > > python-*clients. > > Is it true of any client? I guess some are just OSC plugins 100%. > Do we know which clients have this disparity? > Personally, I encountered this with Glance the most and Cinder to some > extent (but I believe over the course of action Cinder got all features I > wanted from it in the OSC). As far as I know there is still a huge problem with microversion handling which impacts some cinder features. It has been discussed in the past and still present. -- Luigi From thierry at openstack.org Mon Aug 10 10:01:37 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 10 Aug 2020 12:01:37 +0200 Subject: [largescale-sig] Next meeting: August 12, 16utc Message-ID: <6e7a4e43-08f4-3030-2eb0-9311f27d9647@openstack.org> Hi everyone, In order to accommodate US members, the Large Scale SIG recently decided to rotate between an EU-APAC-friendly time and an US-EU-friendly time. Our next meeting will be the first US-EU meeting, on Wednesday, August 12 at 16 UTC[1] in the #openstack-meeting-3 channel on IRC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200812T16 Feel free to add topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting A reminder of the TODOs we had from last meeting, in case you have time to make progress on them: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation Talk to you all on Wednesday, -- Thierry Carrez From emilien at redhat.com Mon Aug 10 12:29:22 2020 From: emilien at redhat.com (Emilien Macchi) Date: Mon, 10 Aug 2020 08:29:22 -0400 Subject: [puppet][congress] Retiring puppet-congress In-Reply-To: References: Message-ID: On Sat, Jun 20, 2020 at 12:44 PM Takashi Kajinami wrote: > Hello, > > > As you know, Congress project has been retired already[1], > so we will retire its puppet module, puppet-congress in > openstack puppet project as well. > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014292.html > > Because congress was directly retired instead of getting migrated > to x namespace, we'll follow the same way about puppet-congress retirement > and won't create x/puppet-congress. > > Thank you for the contribution made for the project ! > Please let us know if you have any concerns about this retirement. > +2 -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From ildiko.vancsa at gmail.com Mon Aug 10 12:31:38 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 10 Aug 2020 14:31:38 +0200 Subject: [upstream-institute] Virtual training sign-up and planning Message-ID: Hi mentors, I’m reaching out to you as the next Open Infrastructure Summit is approaching quickly so it is time to start planning for the next OpenStack Upstream Institute. As the next event will be virtual we will need to re-think the training format and experience to make sure our audience gets the most out of it. I created a new entry on our training occasions wiki page here: https://wiki.openstack.org/wiki/OpenStack_Upstream_Institute_Occasions#Virtual_Training.2C_2020 Please __sign up on the wiki__ if you would like to participate in the preparations and running the virtual training. As it is still vacation season I think we can target the last week of August or first week of September to have the first prep meeting and can collect ideas here or discuss them on the #openstack-upstream-institute IRC channel on Freenode in the meantime. Please let me know if you have any questions or need any help with signing up on the wiki. Thanks and Best Regards, Ildikó From monika.samal at outlook.com Mon Aug 10 07:32:06 2020 From: monika.samal at outlook.com (Monika Samal) Date: Mon, 10 Aug 2020 07:32:06 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Sure, I am trying and will confirm shortly Get Outlook for Android ________________________________ From: Fabian Zimmermann Sent: Monday, August 10, 2020 11:19:36 AM To: Michael Johnson Cc: Monika Samal ; openstack-discuss ; Mark Goddard Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Hi, to test your connection you can create an instance im the octavia network and try to ping/ssh from your controller (dont forget a suitable security group) Fabian Michael Johnson > schrieb am Mo., 10. Aug. 2020, 07:44: That looks like there is still a kolla networking issue where the amphora are not able to reach the controller processes. Please fix the lb-mgmt-net such that it can reach the amphora and the controller containers. This should be setup via the deployment tool, kolla in this case. Michael On Sun, Aug 9, 2020 at 5:02 AM Monika Samal > wrote: Hi All, Below is the error am getting, i tried configuring network issue as well still finding it difficult to resolve. Below is my log...if somebody can help me resolving it..it would be great help since its very urgent... http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ Regards, Monika ________________________________ From: Monika Samal > Sent: Sunday, 9 August, 2020, 5:29 pm To: Mark Goddard; Michael Johnson; openstack-discuss Cc: Fabian Zimmermann Subject: Re: [openstack-community] Octavia :; Unable to create load balancer ________________________________ From: Monika Samal > Sent: Friday, August 7, 2020 4:41:52 AM To: Mark Goddard >; Michael Johnson > Cc: Fabian Zimmermann >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer I tried following above document still facing same Octavia connection error with amphora image. Regards, Monika ________________________________ From: Mark Goddard > Sent: Thursday, August 6, 2020 1:16:01 PM To: Michael Johnson > Cc: Monika Samal >; Fabian Zimmermann >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dikonoor at in.ibm.com Mon Aug 10 07:54:21 2020 From: dikonoor at in.ibm.com (Divya K Konoor) Date: Mon, 10 Aug 2020 13:24:21 +0530 Subject: [openstack-community] Keystone and DBNonExistent Errors In-Reply-To: References: Message-ID: Hi, I am using OpenStack Keystone Stein and run into the below error often where Keystone public process(listening to 5000) is running inside Apache httpd runs into the below. This problem is resolved with a restart of httpd service. Has anyone run into a similar issue ? This is seen soon after httpd is restarted and does not happen all the time. My environment has MariaDB backend. This problem is not limited to the assignment table and is seen across all other tables in Keystone. MariaDB service is functional and all the tables are in place. [Fri Aug 07 08:20:59.936087 2020] [:info] [pid 1420287] mod_wsgi (pid=1420287, process='keystone-public', application=''): Loading WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:20:59.936089 2020] [:info] [pid 1420288] mod_wsgi (pid=1420288, process='keystone-admin', application=''): Loading WSGI script '/usr/bin/keystone-wsgi-admin'. [Fri Aug 07 08:20:59.943431 2020] [ssl:info] [pid 1420290] [client 1.2.3.95:35762] AH01964: Connection to child 1 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:00.009317 2020] [ssl:info] [pid 1420291] [client 1.2.3.113:60132] AH01964: Connection to child 2 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.243594 2020] [ssl:info] [pid 1420289] [client 1.2.3.50:53996] AH01964: Connection to child 0 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.386329 2020] [ssl:info] [pid 1420293] [client x.x.x.x:38645] AH01964: Connection to child 4 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:01.824041 2020] [ssl:info] [pid 1420349] [client 1.2.3.101:42974] AH01964: Connection to child 5 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:02.949166 2020] [ssl:info] [pid 1420378] [client 1.2.3.50:54014] AH01964: Connection to child 9 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:02.949172 2020] [ssl:info] [pid 1420379] [client 1.2.3.80:46924] AH01964: Connection to child 10 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:03.286057 2020] [:info] [pid 1420287] mod_wsgi (pid=1420287): Create interpreter '1.2.3.50:5000|'. [Fri Aug 07 08:21:03.287286 2020] [:info] [pid 1420287] [remote 1.2.3.95:156] mod_wsgi (pid=1420287, process='keystone-public', application='1.2.3.50:5000|'): Loading WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:21:04.675059 2020] [ssl:info] [pid 1420436] [client 1.2.3.50:54032] AH01964: Connection to child 12 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:04.705975 2020] [ssl:info] [pid 1420437] [client 1.2.3.107:59554] AH01964: Connection to child 13 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:06.960940 2020] [ssl:info] [pid 1420438] [client 1.2.3.80:46970] AH01964: Connection to child 14 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:07.661670 2020] [ssl:info] [pid 1420349] [client 1.2.3.50:54124] AH01964: Connection to child 5 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:07.683383 2020] [ssl:info] [pid 1420292] [client x.x.x.x:30065] AH01964: Connection to child 3 established (server 1.2.3.50:5000) [Fri Aug 07 08:21:08.442956 2020] [:error] [pid 1420287] [remote x.x.x.x:144] mod_wsgi (pid=1420287): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-public'. [Fri Aug 07 08:21:08.443002 2020] [:error] [pid 1420287] [remote x.x.x.x:144] Traceback (most recent call last): [Fri Aug 07 08:21:08.443017 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 2309, in __call__ [Fri Aug 07 08:21:08.443509 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.wsgi_app(environ, start_response) [Fri Aug 07 08:21:08.443525 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/werkzeug/contrib/fixers.py", line 152, in __call__ [Fri Aug 07 08:21:08.443630 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.app(environ, start_response) [Fri Aug 07 08:21:08.443644 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.443746 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.443756 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.443773 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.443781 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ [Fri Aug 07 08:21:08.443844 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self.application) [Fri Aug 07 08:21:08.443859 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444194 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444203 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444220 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444229 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 143, in __call__ [Fri Aug 07 08:21:08.444245 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return resp(environ, start_response) [Fri Aug 07 08:21:08.444253 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.444268 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.444276 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.444292 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.444300 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ [Fri Aug 07 08:21:08.444315 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self.application) [Fri Aug 07 08:21:08.444323 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444338 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444346 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444361 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444370 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.444385 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.444393 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.444408 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.444416 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/osprofiler/web.py", line 112, in __call__ [Fri Aug 07 08:21:08.444476 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return request.get_response(self.application) [Fri Aug 07 08:21:08.444485 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444501 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444509 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444524 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444533 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.444547 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.444556 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.444571 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.444587 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/oslo_middleware/request_id.py", line 58, in __call__ [Fri Aug 07 08:21:08.444636 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self.application) [Fri Aug 07 08:21:08.444645 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444660 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444669 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444684 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444698 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/server/flask/request_processing/middleware/url_normalize.py", line 38, in __call__ [Fri Aug 07 08:21:08.444750 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.app(environ, start_response) [Fri Aug 07 08:21:08.444759 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.444774 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.444783 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.444797 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.444810 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 333, in __call__ [Fri Aug 07 08:21:08.444828 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self._app) [Fri Aug 07 08:21:08.444836 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444851 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444859 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444874 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444883 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ [Fri Aug 07 08:21:08.444897 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = self.call_func(req, *args, **kw) [Fri Aug 07 08:21:08.444906 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func [Fri Aug 07 08:21:08.444920 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.func(req, *args, **kwargs) [Fri Aug 07 08:21:08.444929 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ [Fri Aug 07 08:21:08.444944 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = req.get_response(self.application) [Fri Aug 07 08:21:08.444952 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1314, in send [Fri Aug 07 08:21:08.444967 2020] [:error] [pid 1420287] [remote x.x.x.x:144] application, catch_exc_info=False) [Fri Aug 07 08:21:08.444975 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1278, in call_application [Fri Aug 07 08:21:08.444990 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_iter = application(self.environ, start_response) [Fri Aug 07 08:21:08.444998 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/werkzeug/wsgi.py", line 826, in __call__ [Fri Aug 07 08:21:08.445279 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return app(environ, start_response) [Fri Aug 07 08:21:08.445288 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 2295, in wsgi_app [Fri Aug 07 08:21:08.445304 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = self.handle_exception(e) [Fri Aug 07 08:21:08.445316 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445490 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445500 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445516 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445524 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445539 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445547 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445562 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445570 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445585 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445593 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445608 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445616 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445630 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445639 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445654 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445662 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445676 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445685 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445699 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445708 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445722 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445731 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445745 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445758 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445772 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445780 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445795 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445803 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445818 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445826 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445841 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445849 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445863 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445871 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445886 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445894 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445908 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445917 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445931 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445939 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445954 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445962 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445976 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.445984 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.445999 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446007 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446021 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446030 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446044 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446052 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446067 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446075 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446090 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446098 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446113 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446121 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446135 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446143 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446158 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446166 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 1741, in handle_exception [Fri Aug 07 08:21:08.446181 2020] [:error] [pid 1420287] [remote x.x.x.x:144] reraise(exc_type, exc_value, tb) [Fri Aug 07 08:21:08.446190 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 266, in error_router [Fri Aug 07 08:21:08.446204 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.handle_error(e) [Fri Aug 07 08:21:08.446212 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 2292, in wsgi_app [Fri Aug 07 08:21:08.446227 2020] [:error] [pid 1420287] [remote x.x.x.x:144] response = self.full_dispatch_request() [Fri Aug 07 08:21:08.446236 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 1815, in full_dispatch_request [Fri Aug 07 08:21:08.446250 2020] [:error] [pid 1420287] [remote x.x.x.x:144] rv = self.handle_user_exception(e) [Fri Aug 07 08:21:08.446259 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446273 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446282 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446296 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446304 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446318 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446327 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446341 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446349 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446363 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446372 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446386 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446395 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446409 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446417 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446432 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446440 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446454 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446463 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446477 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446485 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446500 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446508 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446522 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446531 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446545 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446553 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446568 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446576 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446590 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446599 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446613 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446621 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446636 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446644 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446658 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446667 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446681 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446689 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446704 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446712 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446727 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446735 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446749 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446757 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446772 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446780 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446795 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446803 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446817 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446826 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446840 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446848 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446863 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446871 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446885 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446893 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446908 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446916 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 269, in error_router [Fri Aug 07 08:21:08.446930 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return original_handler(e) [Fri Aug 07 08:21:08.446938 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 1718, in handle_user_exception [Fri Aug 07 08:21:08.446953 2020] [:error] [pid 1420287] [remote x.x.x.x:144] reraise(exc_type, exc_value, tb) [Fri Aug 07 08:21:08.446962 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 266, in error_router [Fri Aug 07 08:21:08.446976 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.handle_error(e) [Fri Aug 07 08:21:08.446984 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 1813, in full_dispatch_request [Fri Aug 07 08:21:08.446999 2020] [:error] [pid 1420287] [remote x.x.x.x:144] rv = self.dispatch_request() [Fri Aug 07 08:21:08.447007 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/app.py", line 1799, in dispatch_request [Fri Aug 07 08:21:08.447022 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.view_functions[rule.endpoint](**req.view_args) [Fri Aug 07 08:21:08.447031 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 458, in wrapper [Fri Aug 07 08:21:08.447046 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = resource(*args, **kwargs) [Fri Aug 07 08:21:08.447055 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask/views.py", line 88, in view [Fri Aug 07 08:21:08.447119 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self.dispatch_request(*args, **kwargs) [Fri Aug 07 08:21:08.447128 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/flask_restful/__init__.py", line 573, in dispatch_request [Fri Aug 07 08:21:08.447144 2020] [:error] [pid 1420287] [remote x.x.x.x:144] resp = meth(*args, **kwargs) [Fri Aug 07 08:21:08.447152 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/server/flask/common.py", line 1060, in wrapper [Fri Aug 07 08:21:08.447392 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return f(*args, **kwargs) [Fri Aug 07 08:21:08.447406 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/api/auth.py", line 312, in post [Fri Aug 07 08:21:08.447551 2020] [:error] [pid 1420287] [remote x.x.x.x:144] token = authentication.authenticate_for_token(auth_data) [Fri Aug 07 08:21:08.447561 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/api/_shared/authentication.py", line 229, in authenticate_for_token [Fri Aug 07 08:21:08.447652 2020] [:error] [pid 1420287] [remote x.x.x.x:144] app_cred_id=app_cred_id, parent_audit_id=token_audit_id) [Fri Aug 07 08:21:08.447662 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 116, in wrapped [Fri Aug 07 08:21:08.447679 2020] [:error] [pid 1420287] [remote x.x.x.x:144] __ret_val = __f(*args, **kwargs) [Fri Aug 07 08:21:08.447687 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/token/provider.py", line 252, in issue_token [Fri Aug 07 08:21:08.447706 2020] [:error] [pid 1420287] [remote x.x.x.x:144] token.mint(token_id, issued_at) [Fri Aug 07 08:21:08.447714 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/models/token_model.py", line 563, in mint [Fri Aug 07 08:21:08.448498 2020] [:error] [pid 1420287] [remote x.x.x.x:144] self._validate_project_scope() [Fri Aug 07 08:21:08.448508 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/models/token_model.py", line 512, in _validate_project_scope [Fri Aug 07 08:21:08.448525 2020] [:error] [pid 1420287] [remote x.x.x.x:144] if self.project_scoped and not self.roles: [Fri Aug 07 08:21:08.448533 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/models/token_model.py", line 438, in roles [Fri Aug 07 08:21:08.448549 2020] [:error] [pid 1420287] [remote x.x.x.x:144] roles = self._get_project_roles() [Fri Aug 07 08:21:08.448557 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/models/token_model.py", line 400, in _get_project_roles [Fri Aug 07 08:21:08.448573 2020] [:error] [pid 1420287] [remote x.x.x.x:144] self.user_id, self.project_id [Fri Aug 07 08:21:08.448581 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 116, in wrapped [Fri Aug 07 08:21:08.448597 2020] [:error] [pid 1420287] [remote x.x.x.x:144] __ret_val = __f(*args, **kwargs) [Fri Aug 07 08:21:08.448605 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 1220, in decorate [Fri Aug 07 08:21:08.449478 2020] [:error] [pid 1420287] [remote x.x.x.x:144] should_cache_fn) [Fri Aug 07 08:21:08.449488 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 825, in get_or_create [Fri Aug 07 08:21:08.449504 2020] [:error] [pid 1420287] [remote x.x.x.x:144] async_creator) as value: [Fri Aug 07 08:21:08.449512 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/lock.py", line 154, in __enter__ [Fri Aug 07 08:21:08.449967 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self._enter() [Fri Aug 07 08:21:08.449977 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/lock.py", line 94, in _enter [Fri Aug 07 08:21:08.449995 2020] [:error] [pid 1420287] [remote x.x.x.x:144] generated = self._enter_create(createdtime) [Fri Aug 07 08:21:08.450004 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/lock.py", line 145, in _enter_create [Fri Aug 07 08:21:08.450020 2020] [:error] [pid 1420287] [remote x.x.x.x:144] created = self.creator() [Fri Aug 07 08:21:08.450029 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 792, in gen_value [Fri Aug 07 08:21:08.450046 2020] [:error] [pid 1420287] [remote x.x.x.x:144] created_value = creator() [Fri Aug 07 08:21:08.450055 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/dogpile/cache/region.py", line 1216, in creator [Fri Aug 07 08:21:08.450071 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return fn(*arg, **kw) [Fri Aug 07 08:21:08.450080 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 128, in get_roles_for_user_and_project [Fri Aug 07 08:21:08.450975 2020] [:error] [pid 1420287] [remote x.x.x.x:144] user_id=user_id, project_id=project_id, effective=True) [Fri Aug 07 08:21:08.450985 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 116, in wrapped [Fri Aug 07 08:21:08.451001 2020] [:error] [pid 1420287] [remote x.x.x.x:144] __ret_val = __f(*args, **kwargs) [Fri Aug 07 08:21:08.451009 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 999, in list_role_assignments [Fri Aug 07 08:21:08.451025 2020] [:error] [pid 1420287] [remote x.x.x.x:144] strip_domain_roles) [Fri Aug 07 08:21:08.451033 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 845, in _list_effective_role_assignments [Fri Aug 07 08:21:08.451049 2020] [:error] [pid 1420287] [remote x.x.x.x:144] domain_id=domain_id, inherited=inherited) [Fri Aug 07 08:21:08.451057 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/core.py", line 780, in list_role_assignments_for_actor [Fri Aug 07 08:21:08.451072 2020] [:error] [pid 1420287] [remote x.x.x.x:144] group_ids=group_ids, inherited_to_projects=False) [Fri Aug 07 08:21:08.451081 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib/python2.7/site-packages/keystone/assignment/backends/sql.py", line 248, in list_role_assignments [Fri Aug 07 08:21:08.451599 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return [denormalize_role(ref) for ref in query.all()] [Fri Aug 07 08:21:08.451609 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, in all [Fri Aug 07 08:21:08.451632 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return list(self) [Fri Aug 07 08:21:08.451641 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3081, in __iter__ [Fri Aug 07 08:21:08.451656 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return self._execute_and_instances(context) [Fri Aug 07 08:21:08.451665 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3106, in _execute_and_instances [Fri Aug 07 08:21:08.451683 2020] [:error] [pid 1420287] [remote x.x.x.x:144] result = conn.execute(querycontext.statement, self._params) [Fri Aug 07 08:21:08.451691 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 980, in execute [Fri Aug 07 08:21:08.451711 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return meth(self, multiparams, params) [Fri Aug 07 08:21:08.451720 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection [Fri Aug 07 08:21:08.451736 2020] [:error] [pid 1420287] [remote x.x.x.x:144] return connection._execute_clauseelement(self, multiparams, params) [Fri Aug 07 08:21:08.451745 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement [Fri Aug 07 08:21:08.451762 2020] [:error] [pid 1420287] [remote x.x.x.x:144] distilled_params, [Fri Aug 07 08:21:08.451771 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context [Fri Aug 07 08:21:08.451786 2020] [:error] [pid 1420287] [remote x.x.x.x:144] e, statement, parameters, cursor, context [Fri Aug 07 08:21:08.451795 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1456, in _handle_dbapi_exception [Fri Aug 07 08:21:08.451810 2020] [:error] [pid 1420287] [remote x.x.x.x:144] util.raise_from_cause(newraise, exc_info) [Fri Aug 07 08:21:08.451818 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause [Fri Aug 07 08:21:08.451834 2020] [:error] [pid 1420287] [remote x.x.x.x:144] reraise(type(exception), exception, tb=exc_tb, cause=cause) [Fri Aug 07 08:21:08.451843 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context [Fri Aug 07 08:21:08.451858 2020] [:error] [pid 1420287] [remote x.x.x.x:144] cursor, statement, parameters, context [Fri Aug 07 08:21:08.451866 2020] [:error] [pid 1420287] [remote x.x.x.x:144] File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute [Fri Aug 07 08:21:08.451882 2020] [:error] [pid 1420287] [remote x.x.x.x:144] cursor.execute(statement, parameters) [Fri Aug 07 08:21:08.451923 2020] [:error] [pid 1420287] [remote x.x.x.x:144] DBNonExistentTable: (sqlite3.OperationalError) no such table: assignment [SQL: u'SELECT assignment.type AS assignment_type, assignment.actor_id AS assignment_actor_id, assignment.target_id AS assignment_target_id, assignment.role_id AS assignment_role_id, assignment.inherited AS assignment_inherited \\nFROM assignment \\nWHERE assignment.actor_id IN (?) AND assignment.target_id IN (?) AND assignment.type IN (?) AND assignment.inherited = 0'] [parameters: ('15c2fe91e053af57a997c568c117c908d59c138f996bdc19ae97e9f16df12345', '12345978536e45ab8a279e2b0fa4f947', 'UserProject')] (Background on this error at: http://sqlalche.me/e/e3q8) Regards, Divya -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Mon Aug 10 07:46:31 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Mon, 10 Aug 2020 15:46:31 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200805105319.GF2177@nanopsycho> References: <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> Message-ID: <20200810074631.GA29059@joy-OptiPlex-7040> On Wed, Aug 05, 2020 at 12:53:19PM +0200, Jiri Pirko wrote: > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: > >On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote: > >> > >> On 2020/8/5 下午3:56, Jiri Pirko wrote: > >> > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang at redhat.com wrote: > >> > > On 2020/8/5 上午10:16, Yan Zhao wrote: > >> > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote: > >> > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote: > >> > > > > > [sorry about not chiming in earlier] > >> > > > > > > >> > > > > > On Wed, 29 Jul 2020 16:05:03 +0800 > >> > > > > > Yan Zhao wrote: > >> > > > > > > >> > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > >> > > > > > (...) > >> > > > > > > >> > > > > > > > Based on the feedback we've received, the previously proposed interface > >> > > > > > > > is not viable. I think there's agreement that the user needs to be > >> > > > > > > > able to parse and interpret the version information. Using json seems > >> > > > > > > > viable, but I don't know if it's the best option. Is there any > >> > > > > > > > precedent of markup strings returned via sysfs we could follow? > >> > > > > > I don't think encoding complex information in a sysfs file is a viable > >> > > > > > approach. Quoting Documentation/filesystems/sysfs.rst: > >> > > > > > > >> > > > > > "Attributes should be ASCII text files, preferably with only one value > >> > > > > > per file. It is noted that it may not be efficient to contain only one > >> > > > > > value per file, so it is socially acceptable to express an array of > >> > > > > > values of the same type. > >> > > > > > Mixing types, expressing multiple lines of data, and doing fancy > >> > > > > > formatting of data is heavily frowned upon." > >> > > > > > > >> > > > > > Even though this is an older file, I think these restrictions still > >> > > > > > apply. > >> > > > > +1, that's another reason why devlink(netlink) is better. > >> > > > > > >> > > > hi Jason, > >> > > > do you have any materials or sample code about devlink, so we can have a good > >> > > > study of it? > >> > > > I found some kernel docs about it but my preliminary study didn't show me the > >> > > > advantage of devlink. > >> > > > >> > > CC Jiri and Parav for a better answer for this. > >> > > > >> > > My understanding is that the following advantages are obvious (as I replied > >> > > in another thread): > >> > > > >> > > - existing users (NIC, crypto, SCSI, ib), mature and stable > >> > > - much better error reporting (ext_ack other than string or errno) > >> > > - namespace aware > >> > > - do not couple with kobject > >> > Jason, what is your use case? > >> > >> > >> I think the use case is to report device compatibility for live migration. > >> Yan proposed a simple sysfs based migration version first, but it looks not > >> sufficient and something based on JSON is discussed. > >> > >> Yan, can you help to summarize the discussion so far for Jiri as a > >> reference? > >> > >yes. > >we are currently defining an device live migration compatibility > >interface in order to let user space like openstack and libvirt knows > >which two devices are live migration compatible. > >currently the devices include mdev (a kernel emulated virtual device) > >and physical devices (e.g. a VF of a PCI SRIOV device). > > > >the attributes we want user space to compare including > >common attribues: > > device_api: vfio-pci, vfio-ccw... > > mdev_type: mdev type of mdev or similar signature for physical device > > It specifies a device's hardware capability. e.g. > > i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics > > device. > > software_version: device driver's version. > > in .[.bugfix] scheme, where there is no > > compatibility across major versions, minor versions have > > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > > bugfix version number indicates some degree of internal > > improvement that is not visible to the user in terms of > > features or compatibility, > > > >vendor specific attributes: each vendor may define different attributes > > device id : device id of a physical devices or mdev's parent pci device. > > it could be equal to pci id for pci devices > > aggregator: used together with mdev_type. e.g. aggregator=2 together > > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > > graphics device. > > remote_url: for a local NVMe VF, it may be configured with a remote > > url of a remote storage and all data is stored in the > > remote side specified by the remote url. > > ... > > > >Comparing those attributes by user space alone is not an easy job, as it > >can't simply assume an equal relationship between source attributes and > >target attributes. e.g. > >for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of > >gen9), it actually could find a compatible device of > >mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9), > >if mdev_type of i915-GVTg_V5_4 is not available in the target machine. > > > >So, in our current proposal, we want to create two sysfs attributes > >under a device sysfs node. > >/sys//migration/self > >/sys//migration/compatible > > > >#cat /sys//migration/self > >device_type=vfio_pci > >mdev_type=i915-GVTg_V5_4 > >device_id=8086591d > >aggregator=2 > >software_version=1.0.0 > > > >#cat /sys//migration/compatible > >device_type=vfio_pci > >mdev_type=i915-GVTg_V5_{val1:int:2,4,8} > >device_id=8086591d > >aggregator={val1}/2 > >software_version=1.0.0 > > > >The /sys//migration/self specifies self attributes of > >a device. > >The /sys//migration/compatible specifies the list of > >compatible devices of a device. as in the example, compatible devices > >could have > > device_type == vfio_pci && > > device_id == 8086591d && > > software_version == 1.0.0 && > > ( > > (mdev_type of i915-GVTg_V5_2 && aggregator==1) || > > (mdev_type of i915-GVTg_V5_4 && aggregator==2) || > > (mdev_type of i915-GVTg_V5_8 && aggregator=4) > > ) > > > >by comparing whether a target device is in compatible list of source > >device, the user space can know whether a two devices are live migration > >compatible. > > > >Additional notes: > >1)software_version in the compatible list may not be necessary as it > >already has a major.minor.bugfix scheme. > >2)for vendor attribute like remote_url, it may not be statically > >assigned and could be changed with a device interface. > > > >So, as Cornelia pointed that it's not good to use complex format in > >a sysfs attribute, we'd like to know whether there're other good ways to > >our use case, e.g. splitting a single attribute to multiple simple sysfs > >attributes as what Cornelia suggested or devlink that Jason has strongly > >recommended. > > Hi Yan. > Hi Jiri, > Thanks for the explanation, I'm still fuzzy about the details. > Anyway, I suggest you to check "devlink dev info" command we have > implemented for multiple drivers. You can try netdevsim to test this. > I think that the info you need to expose might be put there. do you mean drivers/net/netdevsim/ ? > > Devlink creates instance per-device. Specific device driver calls into > devlink core to create the instance. What device do you have? What the devlink core is net/core/devlink.c ? > driver is it handled by? It looks that the devlink is for network device specific, and in devlink.h, it says include/uapi/linux/devlink.h - Network physical device Netlink interface, I feel like it's not very appropriate for a GPU driver to use this interface. Is that right? Thanks Yan From monika.samal at outlook.com Mon Aug 10 11:12:31 2020 From: monika.samal at outlook.com (Monika Samal) Date: Mon, 10 Aug 2020 11:12:31 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hey Fabian, I tried creating and testing instance with my available subnet created for loadbalancer , I am not able to ping it. Please find below ip a output for controller and deployment node: Controller Node: 30.0.0.14 [cid:18b1d2b8-1adf-45f1-9dd6-b185223e060e] Deployment Node: 30.0.0.11 [cid:20b35bae-677f-462c-8c38-7fafdb058219] [cid:66fb7ccf-ba53-4662-a0b8-b129da885f1a] ________________________________ From: Fabian Zimmermann Sent: Monday, August 10, 2020 11:19 AM To: Michael Johnson Cc: Monika Samal ; openstack-discuss ; Mark Goddard Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Hi, to test your connection you can create an instance im the octavia network and try to ping/ssh from your controller (dont forget a suitable security group) Fabian Michael Johnson > schrieb am Mo., 10. Aug. 2020, 07:44: That looks like there is still a kolla networking issue where the amphora are not able to reach the controller processes. Please fix the lb-mgmt-net such that it can reach the amphora and the controller containers. This should be setup via the deployment tool, kolla in this case. Michael On Sun, Aug 9, 2020 at 5:02 AM Monika Samal > wrote: Hi All, Below is the error am getting, i tried configuring network issue as well still finding it difficult to resolve. Below is my log...if somebody can help me resolving it..it would be great help since its very urgent... http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ Regards, Monika ________________________________ From: Monika Samal > Sent: Sunday, 9 August, 2020, 5:29 pm To: Mark Goddard; Michael Johnson; openstack-discuss Cc: Fabian Zimmermann Subject: Re: [openstack-community] Octavia :; Unable to create load balancer ________________________________ From: Monika Samal > Sent: Friday, August 7, 2020 4:41:52 AM To: Mark Goddard >; Michael Johnson > Cc: Fabian Zimmermann >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer I tried following above document still facing same Octavia connection error with amphora image. Regards, Monika ________________________________ From: Mark Goddard > Sent: Thursday, August 6, 2020 1:16:01 PM To: Michael Johnson > Cc: Monika Samal >; Fabian Zimmermann >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer On Wed, 5 Aug 2020 at 16:16, Michael Johnson > wrote: Looking at that error, it appears that the lb-mgmt-net is not setup correctly. The Octavia controller containers are not able to reach the amphora instances on the lb-mgmt-net subnet. I don't know how kolla is setup to connect the containers to the neutron lb-mgmt-net network. Maybe the above documents will help with that. Right now it's up to the operator to configure that. The kolla documentation doesn't prescribe any particular setup. We're working on automating it in Victoria. Michael On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard > wrote: On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: Hello Guys, With Michaels help I was able to solve the problem but now there is another error I was able to create my network on vlan but still error persist. PFB the logs: http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ Kindly help regards, Monika ________________________________ From: Michael Johnson > Sent: Monday, August 3, 2020 9:10 PM To: Fabian Zimmermann > Cc: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Yeah, it looks like nova is failing to boot the instance. Check this setting in your octavia.conf files: https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id Also, if kolla-ansible didn't set both of these values correctly, please open bug reports for kolla-ansible. These all should have been configured by the deployment tool. I wasn't following this thread due to no [kolla] tag, but here are the recently added docs for Octavia in kolla [1]. Note the octavia_service_auth_project variable which was added to migrate from the admin project to the service project for octavia resources. We're lacking proper automation for the flavor, image etc, but it is being worked on in Victoria [2]. [1] https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html [2] https://review.opendev.org/740180 Michael On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: Seems like the flavor is missing or empty '' - check for typos and enable debug. Check if the nova req contains valid information/flavor. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 15:46: It's registered Get Outlook for Android ________________________________ From: Fabian Zimmermann > Sent: Monday, August 3, 2020 7:08:21 PM To: Monika Samal >; openstack-discuss > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Did you check the (nova) flavor you use in octavia. Fabian Monika Samal > schrieb am Mo., 3. Aug. 2020, 10:53: After Michael suggestion I was able to create load balancer but there is error in status. [X] PFB the error link: http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ ________________________________ From: Monika Samal > Sent: Monday, August 3, 2020 2:08 PM To: Michael Johnson > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thanks a ton Michael for helping me out ________________________________ From: Michael Johnson > Sent: Friday, July 31, 2020 3:57 AM To: Monika Samal > Cc: Fabian Zimmermann >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 97899 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 22490 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 24398 bytes Desc: image.png URL: From juliaashleykreger at gmail.com Mon Aug 10 16:17:41 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 10 Aug 2020 09:17:41 -0700 Subject: [Ironic] User Survey question Message-ID: Greetings awesome Artificial Intelligences and fellow humanoid carbon units! This week I need to submit the question for the 2021 user survey. We discussed this some during our weekly IRC meeting today.[0] Presently, the question is: "Ironic: What would you find most useful if it was part of ironic?" I'd like to propose we collect more data in order to enable us to make informed decisions for features and maintenance work moving forward. While this is long term thinking, I'm wondering if operators would be interested in collecting and submitting some basic data or using a tool, to submit anonymous usage data so we can gain insight into hardware types in use, numbers of machines, which interfaces are used, etc. So I'm thinking something along the lines of: "Ironic: Would you be willing to submit anonymous usage statistics (Number of nodes, conductors, which drivers are in use, etc) if such a tool existed? Yes/No/Not Applicable" Thoughts? Feelings? Concerns? Other ideas? -Julia [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html From dev.faz at gmail.com Mon Aug 10 16:57:57 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 10 Aug 2020 18:57:57 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Hi, Check if the vlan of eth1 is reachable by the compute nodes on the nic connected to br-ex. Is the lb-net created with the correct vlan-id so the traffic is able to flow from the nic to br-ex to the instance? Proof this with tcpdump (-e) Fabian connected to the Monika Samal schrieb am Mo., 10. Aug. 2020, 13:12: > Hey Fabian, > > I tried creating and testing instance with my available subnet created for > loadbalancer , I am not able to ping it. > > Please find below ip a output for controller and deployment node: > > Controller Node: 30.0.0.14 > > Deployment Node: 30.0.0.11 > > > > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Monday, August 10, 2020 11:19 AM > *To:* Michael Johnson > *Cc:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org>; Mark Goddard > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Hi, > > to test your connection you can create an instance im the octavia network > and try to ping/ssh from your controller (dont forget a suitable security > group) > > Fabian > > Michael Johnson schrieb am Mo., 10. Aug. 2020, > 07:44: > > > That looks like there is still a kolla networking issue where the amphora > are not able to reach the controller processes. Please fix the lb-mgmt-net > such that it can reach the amphora and the controller containers. This > should be setup via the deployment tool, kolla in this case. > > Michael > > On Sun, Aug 9, 2020 at 5:02 AM Monika Samal > wrote: > > Hi All, > > Below is the error am getting, i tried configuring network issue as well > still finding it difficult to resolve. > > Below is my log...if somebody can help me resolving it..it would be great > help since its very urgent... > > http://paste.openstack.org/show/TsagcQX2ZKd6rhhsYcYd/ > > Regards, > Monika > ------------------------------ > *From:* Monika Samal > *Sent:* Sunday, 9 August, 2020, 5:29 pm > *To:* Mark Goddard; Michael Johnson; openstack-discuss > *Cc:* Fabian Zimmermann > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > ------------------------------ > *From:* Monika Samal > *Sent:* Friday, August 7, 2020 4:41:52 AM > *To:* Mark Goddard ; Michael Johnson < > johnsomor at gmail.com> > *Cc:* Fabian Zimmermann ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > I tried following above document still facing same Octavia connection > error with amphora image. > > Regards, > Monika > ------------------------------ > *From:* Mark Goddard > *Sent:* Thursday, August 6, 2020 1:16:01 PM > *To:* Michael Johnson > *Cc:* Monika Samal ; Fabian Zimmermann < > dev.faz at gmail.com>; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > On Wed, 5 Aug 2020 at 16:16, Michael Johnson wrote: > > Looking at that error, it appears that the lb-mgmt-net is not setup > correctly. The Octavia controller containers are not able to reach the > amphora instances on the lb-mgmt-net subnet. > > I don't know how kolla is setup to connect the containers to the neutron > lb-mgmt-net network. Maybe the above documents will help with that. > > > Right now it's up to the operator to configure that. The kolla > documentation doesn't prescribe any particular setup. We're working on > automating it in Victoria. > > > Michael > > On Wed, Aug 5, 2020 at 12:53 AM Mark Goddard wrote: > > > > On Tue, 4 Aug 2020 at 16:58, Monika Samal > wrote: > > Hello Guys, > > With Michaels help I was able to solve the problem but now there is > another error I was able to create my network on vlan but still error > persist. PFB the logs: > > http://paste.openstack.org/show/fEixSudZ6lzscxYxsG1z/ > > Kindly help > > regards, > Monika > ------------------------------ > *From:* Michael Johnson > *Sent:* Monday, August 3, 2020 9:10 PM > *To:* Fabian Zimmermann > *Cc:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Yeah, it looks like nova is failing to boot the instance. > > Check this setting in your octavia.conf files: > https://docs.openstack.org/octavia/latest/configuration/configref.html#controller_worker.amp_flavor_id > > Also, if kolla-ansible didn't set both of these values correctly, please > open bug reports for kolla-ansible. These all should have been configured > by the deployment tool. > > > I wasn't following this thread due to no [kolla] tag, but here are the > recently added docs for Octavia in kolla [1]. Note > the octavia_service_auth_project variable which was added to migrate from > the admin project to the service project for octavia resources. We're > lacking proper automation for the flavor, image etc, but it is being worked > on in Victoria [2]. > > [1] > https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html > [2] https://review.opendev.org/740180 > > Michael > > On Mon, Aug 3, 2020 at 7:53 AM Fabian Zimmermann > wrote: > > Seems like the flavor is missing or empty '' - check for typos and enable > debug. > > Check if the nova req contains valid information/flavor. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 15:46: > > It's registered > > Get Outlook for Android > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Monday, August 3, 2020 7:08:21 PM > *To:* Monika Samal ; openstack-discuss < > openstack-discuss at lists.openstack.org> > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Did you check the (nova) flavor you use in octavia. > > Fabian > > Monika Samal schrieb am Mo., 3. Aug. 2020, > 10:53: > > After Michael suggestion I was able to create load balancer but there is > error in status. > > > > PFB the error link: > > http://paste.openstack.org/show/meNZCeuOlFkfjj189noN/ > ------------------------------ > *From:* Monika Samal > *Sent:* Monday, August 3, 2020 2:08 PM > *To:* Michael Johnson > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Thanks a ton Michael for helping me out > ------------------------------ > *From:* Michael Johnson > *Sent:* Friday, July 31, 2020 3:57 AM > *To:* Monika Samal > *Cc:* Fabian Zimmermann ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > Just to close the loop on this, the octavia.conf file had > "project_name = admin" instead of "project_name = service" in the > [service_auth] section. This was causing the keystone errors when > Octavia was communicating with neutron. > > I don't know if that is a bug in kolla-ansible or was just a local > configuration issue. > > Michael > > On Thu, Jul 30, 2020 at 1:39 PM Monika Samal > wrote: > > > > Hello Fabian,, > > > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > > > Regards, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:57 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > Hi, > > > > just to debug, could you replace the auth_type password with v3password? > > > > And do a curl against your :5000 and :35357 urls and paste the output. > > > > Fabian > > > > Monika Samal schrieb am Do., 30. Juli 2020, > 22:15: > > > > Hello Fabian, > > > > http://paste.openstack.org/show/796477/ > > > > Thanks, > > Monika > > ________________________________ > > From: Fabian Zimmermann > > Sent: Friday, July 31, 2020 1:38 AM > > To: Monika Samal > > Cc: Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > > Subject: Re: [openstack-community] Octavia :; Unable to create load > balancer > > > > The sections should be > > > > service_auth > > keystone_authtoken > > > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > > > Fabian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 97899 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 22490 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 24398 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 24398 bytes Desc: not available URL: From opensrloo at gmail.com Mon Aug 10 17:28:46 2020 From: opensrloo at gmail.com (Ruby Loo) Date: Mon, 10 Aug 2020 13:28:46 -0400 Subject: [Ironic] User Survey question In-Reply-To: References: Message-ID: Hi Julia, Please remind me, are we allowed one question? I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. What about pain-points wrt ironic? Could we ask that? --ruby On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger wrote: > Greetings awesome Artificial Intelligences and fellow humanoid carbon > units! > > This week I need to submit the question for the 2021 user survey. We > discussed this some during our weekly IRC meeting today.[0] > > Presently, the question is: > > "Ironic: What would you find most useful if it was part of ironic?" > > I'd like to propose we collect more data in order to enable us to make > informed decisions for features and maintenance work moving forward. > While this is long term thinking, I'm wondering if operators would be > interested in collecting and submitting some basic data or using a > tool, to submit anonymous usage data so we can gain insight into > hardware types in use, numbers of machines, which interfaces are used, > etc. > > So I'm thinking something along the lines of: > > "Ironic: Would you be willing to submit anonymous usage statistics > (Number of nodes, conductors, which drivers are in use, etc) if such a > tool existed? Yes/No/Not Applicable" > > Thoughts? Feelings? Concerns? Other ideas? > > -Julia > > > [0]: > http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openstack.org Mon Aug 10 17:30:48 2020 From: allison at openstack.org (Allison Price) Date: Mon, 10 Aug 2020 12:30:48 -0500 Subject: [Ironic] User Survey question In-Reply-To: References: Message-ID: <6E7AE88E-3E6A-4791-AFDB-600DB8E2AC40@openstack.org> Coming solely from the User Survey POV, each project and SIG is allowed up to 2 questions. We create that limit to ensure that the survey does not get too terribly long. If the Ironic team would like to add one question, we can. Thanks! Allison > On Aug 10, 2020, at 12:28 PM, Ruby Loo wrote: > > Hi Julia, > > Please remind me, are we allowed one question? > > I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. > > What about pain-points wrt ironic? Could we ask that? > > --ruby > > On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger > wrote: > Greetings awesome Artificial Intelligences and fellow humanoid carbon units! > > This week I need to submit the question for the 2021 user survey. We > discussed this some during our weekly IRC meeting today.[0] > > Presently, the question is: > > "Ironic: What would you find most useful if it was part of ironic?" > > I'd like to propose we collect more data in order to enable us to make > informed decisions for features and maintenance work moving forward. > While this is long term thinking, I'm wondering if operators would be > interested in collecting and submitting some basic data or using a > tool, to submit anonymous usage data so we can gain insight into > hardware types in use, numbers of machines, which interfaces are used, > etc. > > So I'm thinking something along the lines of: > > "Ironic: Would you be willing to submit anonymous usage statistics > (Number of nodes, conductors, which drivers are in use, etc) if such a > tool existed? Yes/No/Not Applicable" > > Thoughts? Feelings? Concerns? Other ideas? > > -Julia > > > [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Mon Aug 10 19:13:21 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 10 Aug 2020 12:13:21 -0700 Subject: [all] Virtual PTG October 2020 Dates & Registration Message-ID: Hello Everyone! I'm sure you all have been anxiously awaiting the announcement of the dates for the next virtual PTG! After polling the community[1] and balancing the pros and cons, we have decided the PTG will take place the week after the Open Infrastructure Summit[2][3] from October 26th to October 30th, 2020. PTG registration is now open[4]. Like last time, it is free, but we will again be using it to communicate details about the event (schedules, passwords, etc), so please register! Later this week we will send out info about signing up teams. Also, the same as last time, we will have an ethercalc signup and a survey to gather some other data about your team. -the Kendalls (diablo_rojo & wendallkaters) [1] ML Poll for PTG Dates: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016098.html [2] Summit Site: https://www.openstack.org/summit/2020/ [3] Summit Registration: https://openinfrasummit2020.eventbrite.com [4] PTG Registration: https://october2020ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.bell at cern.ch Mon Aug 10 20:37:27 2020 From: tim.bell at cern.ch (timbell) Date: Mon, 10 Aug 2020 22:37:27 +0200 Subject: [Ironic] User Survey question In-Reply-To: References: Message-ID: Ceph has done something like this with good results - https://docs.ceph.com/docs/master/mgr/telemetry/ I think the things that have helped this to be successful are - easy way to see what you would send - option (not the default) to provide more details such as company and contact I think many OpenStack projects could benefit from this sort of approach for - capacity growth - rate of upgrade - support some of the user survey activities by automatically collecting data rather than asking for responses in a manual survey Would it be possible to consider an Oslo module so the infrastructure could be common and then we make it anonymous opt-in ? Tim > On 10 Aug 2020, at 18:17, Julia Kreger wrote: > > Greetings awesome Artificial Intelligences and fellow humanoid carbon units! > > This week I need to submit the question for the 2021 user survey. We > discussed this some during our weekly IRC meeting today.[0] > > Presently, the question is: > > "Ironic: What would you find most useful if it was part of ironic?" > > I'd like to propose we collect more data in order to enable us to make > informed decisions for features and maintenance work moving forward. > While this is long term thinking, I'm wondering if operators would be > interested in collecting and submitting some basic data or using a > tool, to submit anonymous usage data so we can gain insight into > hardware types in use, numbers of machines, which interfaces are used, > etc. > > So I'm thinking something along the lines of: > > "Ironic: Would you be willing to submit anonymous usage statistics > (Number of nodes, conductors, which drivers are in use, etc) if such a > tool existed? Yes/No/Not Applicable" > > Thoughts? Feelings? Concerns? Other ideas? > > -Julia > > > [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Mon Aug 10 20:48:09 2020 From: emilien at redhat.com (Emilien Macchi) Date: Mon, 10 Aug 2020 16:48:09 -0400 Subject: [tripleo][ansible] the current action plugins use patterns are suboptimal? In-Reply-To: <385dc8d7-198f-64ce-908f-49ab823ed229@redhat.com> References: <6feb1d83-5cc8-1916-90a7-1a6b54593310@redhat.com> <385dc8d7-198f-64ce-908f-49ab823ed229@redhat.com> Message-ID: On Tue, Aug 4, 2020 at 5:42 AM Bogdan Dobrelya wrote: (...) > I can understand that ansible should not be fixed for some composition > tasks what require iterations and have complex logic for its "unit of > work". And such ones also should be unit tested indeed. What I do not > fully understand though is then what abandoning paunch for its action > plugin had bought for us in the end? > > Paunch was self-contained and had no external dependencies on > fast-changing ansible frameworks. There was also no need for paunch to > handle the ansible-specific execution strategies and nuances, like "what > if that action plugin is called in async or in the check-mode?" Unit > tests exited in paunch as well. It was easy to backport changes within a > single code base. > > So, looking back retrospectively, was rewriting paunch as an action > plugin a simplification of the deployment framework? Please reply to > yourself honestly. It does pretty same things but differently and added > external framework. It is now also self-contained action plugin, since > traditional tasks cannot be used in loops for this goal because of > performance reasons. > I asked myself the same questions several times and to me the major driver around removing paunch was to move the container logic into Ansible modules which would be community supported vs paunch-runner code. The Ansible role version has also brought more plugability with other components of the framework (Ansible strategies, debugging, etc) but I don't think it was the major reason why we wrote it. The simplification aspect was mostly about removing the CLI which I don't think was needed at the end; and also the unique name thing which was a mistake and caused us many troubles as you know. To summarize, action plugins may be a good solution indeed, but perhaps > we should go back and use paunch instead of ansible? Same applies for > *some* other tasks? That would also provide a balance, for action > plugins, tasks and common sense. > Sagi is currently working on replacing the podman_containers action plugin by a module that will be able to process multiple containers at the same time, so we'll have less tasks and potentially faster operations at scale. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Aug 10 20:53:26 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 10 Aug 2020 20:53:26 +0000 Subject: [Ironic] User Survey question In-Reply-To: References: Message-ID: <20200810205325.42rr7i4qr4jhdow3@yuggoth.org> On 2020-08-10 22:37:27 +0200 (+0200), timbell wrote: > Ceph has done something like this with good results - > https://docs.ceph.com/docs/master/mgr/telemetry/ > > I think the things that have helped this to be successful are > > - easy way to see what you would send > - option (not the default) to provide more details such as company > and contact [...] Other prior art which springs to mind: Debian has provided a popcon tool for ages, as an opt-in means of periodically providing feedback on what packages are seeing use in their distro. It's current incarnation can submit reports via SMTP or HTTP protocols for added flexibility. https://popcon.debian.org/ OpenBSD takes a low-effort approach and suggests a command in their install guide which the admin can run to send a copy of dmesg output to the project so they can keep track of what sorts of hardware is running their operating system out in the wild. https://www.openbsd.org/faq/faq4.html#SendDmesg -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From Arkady.Kanevsky at dell.com Mon Aug 10 21:06:19 2020 From: Arkady.Kanevsky at dell.com (Kanevsky, Arkady) Date: Mon, 10 Aug 2020 21:06:19 +0000 Subject: [Ironic] User Survey question In-Reply-To: <6E7AE88E-3E6A-4791-AFDB-600DB8E2AC40@openstack.org> References: <6E7AE88E-3E6A-4791-AFDB-600DB8E2AC40@openstack.org> Message-ID: Do we know what % of current deployments use Ironic? I recall several years back it was 25%. But do not recall seeing latest info. Then Julia question on size and which components of Ironic are being used. Maybe if we treat "N/A" answers as they do not use Ironic it first into a single question. I do love open ended question where users can ask for improvements/extensions. Thanks, Arkady From: Allison Price Sent: Monday, August 10, 2020 12:31 PM To: Ruby Loo Cc: Julia Kreger; openstack-discuss Subject: Re: [Ironic] User Survey question [EXTERNAL EMAIL] Coming solely from the User Survey POV, each project and SIG is allowed up to 2 questions. We create that limit to ensure that the survey does not get too terribly long. If the Ironic team would like to add one question, we can. Thanks! Allison On Aug 10, 2020, at 12:28 PM, Ruby Loo > wrote: Hi Julia, Please remind me, are we allowed one question? I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. What about pain-points wrt ironic? Could we ask that? --ruby On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger > wrote: Greetings awesome Artificial Intelligences and fellow humanoid carbon units! This week I need to submit the question for the 2021 user survey. We discussed this some during our weekly IRC meeting today.[0] Presently, the question is: "Ironic: What would you find most useful if it was part of ironic?" I'd like to propose we collect more data in order to enable us to make informed decisions for features and maintenance work moving forward. While this is long term thinking, I'm wondering if operators would be interested in collecting and submitting some basic data or using a tool, to submit anonymous usage data so we can gain insight into hardware types in use, numbers of machines, which interfaces are used, etc. So I'm thinking something along the lines of: "Ironic: Would you be willing to submit anonymous usage statistics (Number of nodes, conductors, which drivers are in use, etc) if such a tool existed? Yes/No/Not Applicable" Thoughts? Feelings? Concerns? Other ideas? -Julia [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openstack.org Mon Aug 10 21:13:31 2020 From: allison at openstack.org (Allison Price) Date: Mon, 10 Aug 2020 16:13:31 -0500 Subject: [Ironic] User Survey question In-Reply-To: References: <6E7AE88E-3E6A-4791-AFDB-600DB8E2AC40@openstack.org> Message-ID: <1DFB0CD9-008D-4937-AD70-F3150BEC823F@openstack.org> As of right now for the 2020 version, we are sitting at 19% and for 2019, it was the same. > On Aug 10, 2020, at 4:06 PM, Kanevsky, Arkady wrote: > > Do we know what % of current deployments use Ironic? > I recall several years back it was 25%. But do not recall seeing latest info. > Then Julia question on size and which components of Ironic are being used. > Maybe if we treat “N/A” answers as they do not use Ironic it first into a single question. > I do love open ended question where users can ask for improvements/extensions. > Thanks, > Arkady > > From: Allison Price > > Sent: Monday, August 10, 2020 12:31 PM > To: Ruby Loo > Cc: Julia Kreger; openstack-discuss > Subject: Re: [Ironic] User Survey question > > [EXTERNAL EMAIL] > > Coming solely from the User Survey POV, each project and SIG is allowed up to 2 questions. We create that limit to ensure that the survey does not get too terribly long. > > If the Ironic team would like to add one question, we can. > > Thanks! > Allison > > > > > On Aug 10, 2020, at 12:28 PM, Ruby Loo > wrote: > > Hi Julia, > > Please remind me, are we allowed one question? > > I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. > > What about pain-points wrt ironic? Could we ask that? > > --ruby > > On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger > wrote: > Greetings awesome Artificial Intelligences and fellow humanoid carbon units! > > This week I need to submit the question for the 2021 user survey. We > discussed this some during our weekly IRC meeting today.[0] > > Presently, the question is: > > "Ironic: What would you find most useful if it was part of ironic?" > > I'd like to propose we collect more data in order to enable us to make > informed decisions for features and maintenance work moving forward. > While this is long term thinking, I'm wondering if operators would be > interested in collecting and submitting some basic data or using a > tool, to submit anonymous usage data so we can gain insight into > hardware types in use, numbers of machines, which interfaces are used, > etc. > > So I'm thinking something along the lines of: > > "Ironic: Would you be willing to submit anonymous usage statistics > (Number of nodes, conductors, which drivers are in use, etc) if such a > tool existed? Yes/No/Not Applicable" > > Thoughts? Feelings? Concerns? Other ideas? > > -Julia > > > [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Mon Aug 10 22:07:29 2020 From: miguel at mlavalle.com (Miguel Lavalle) Date: Mon, 10 Aug 2020 17:07:29 -0500 Subject: [neutron] bug deputy report August 3 - 9 Message-ID: Critical ====== https://bugs.launchpad.net/neutron/+bug/1890445 [ovn] Tempest test test_update_router_admin_state failing very often. NEEDS OWNER https://bugs.launchpad.net/neutron/+bug/1890493 Periodic job neutron-ovn-tempest-ovs-master-fedora is failing 100% of times. SEEMS TO NEED AN OWNER High ==== https://bugs.launchpad.net/neutron/+bug/1890269 Fullstack test neutron.tests.fullstack.test_logging.TestLogging is failing on Ubuntu Focal. Proposed fix: https://review.opendev.org/#/c/734304/ https://bugs.launchpad.net/neutron/+bug/1890297 CI grenade jobs failing. Proposed fix: https://review.opendev.org/#/c/744753/1. Fix released https://bugs.launchpad.net/neutron/+bug/1890400 Default gateway in HA router namespace not set if using Keepalived 1.x. Awaiting patch from Slawek https://bugs.launchpad.net/neutron/+bug/1890353 support pyroute2 0.5.13. Awaiting patch from Rodolfo https://bugs.launchpad.net/neutron/+bug/1890432 Create subnet is failing under high load with OVN. WIP fix: https://review.opendev.org/#/c/745330/ Medium ====== https://bugs.launchpad.net/neutron/+bug/1890539 failed to create port with security group of other tenant. Proposed fix: https://review.opendev.org/#/c/745089 -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Mon Aug 10 23:38:30 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 10 Aug 2020 16:38:30 -0700 Subject: [Ironic] User Survey question In-Reply-To: References: <6E7AE88E-3E6A-4791-AFDB-600DB8E2AC40@openstack.org> Message-ID: For me, it is less about aggregate usage of the respondent users, and more about collecting the actual utilization statistics for running deployments so we can make informed decisions. For example, If we know there is huge IPMI console usage, then we can know how to prioritize the same for redfish, or potentially not. I do also like the open-ended question response nature, but I've not found the data very valuable except to help tailor some of the outward facing communications for purposes of project updates. Largely because many users are not "zoomed in" at the level most contributors or even where everyday individual project users are. They are zoomed out looking at the whole. If OSF is willing for us to have two questions, I'm all for making use of it. I always thought we would only be permitted one. -Julia On Mon, Aug 10, 2020 at 2:06 PM Kanevsky, Arkady wrote: > > Do we know what % of current deployments use Ironic? > > I recall several years back it was 25%. But do not recall seeing latest info. > > Then Julia question on size and which components of Ironic are being used. > > Maybe if we treat “N/A” answers as they do not use Ironic it first into a single question. > > I do love open ended question where users can ask for improvements/extensions. > > Thanks, > > Arkady > > > > From: Allison Price > Sent: Monday, August 10, 2020 12:31 PM > To: Ruby Loo > Cc: Julia Kreger; openstack-discuss > Subject: Re: [Ironic] User Survey question > > > > [EXTERNAL EMAIL] > > Coming solely from the User Survey POV, each project and SIG is allowed up to 2 questions. We create that limit to ensure that the survey does not get too terribly long. > > > > If the Ironic team would like to add one question, we can. > > > > Thanks! > > Allison > > > > > > > > On Aug 10, 2020, at 12:28 PM, Ruby Loo wrote: > > > > Hi Julia, > > > > Please remind me, are we allowed one question? > > > > I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. > > > > What about pain-points wrt ironic? Could we ask that? > > > > --ruby > > > > On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger wrote: > > Greetings awesome Artificial Intelligences and fellow humanoid carbon units! > > This week I need to submit the question for the 2021 user survey. We > discussed this some during our weekly IRC meeting today.[0] > > Presently, the question is: > > "Ironic: What would you find most useful if it was part of ironic?" > > I'd like to propose we collect more data in order to enable us to make > informed decisions for features and maintenance work moving forward. > While this is long term thinking, I'm wondering if operators would be > interested in collecting and submitting some basic data or using a > tool, to submit anonymous usage data so we can gain insight into > hardware types in use, numbers of machines, which interfaces are used, > etc. > > So I'm thinking something along the lines of: > > "Ironic: Would you be willing to submit anonymous usage statistics > (Number of nodes, conductors, which drivers are in use, etc) if such a > tool existed? Yes/No/Not Applicable" > > Thoughts? Feelings? Concerns? Other ideas? > > -Julia > > > [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html > > From jimmy at openstack.org Tue Aug 11 00:43:26 2020 From: jimmy at openstack.org (Jimmy McArthur) Date: Mon, 10 Aug 2020 19:43:26 -0500 Subject: [Ironic] User Survey question In-Reply-To: References: Message-ID: <10446121-C615-4345-A761-58F33E6C1DD8@getmailspring.com> I think originally it was one, but the survey grew enough that 2 questions were needed by a lot of projects. Definitely feel free to add another question. And I agree - having a free-text question is cool, but it doesn't help for tracking trends, by and large. Sometimes we're even able to fold two questions into one... so if you can give us an idea of exactly the data you want and how you would format the questions in a perfect world, we might be able to get more out of it. For instance, a follow up question like "Other, please explain" still only counts as the one question :) So if you want to do some second level dependency stuff, we might be able to extract more data out of it. Cheers, Jimmy On Aug 10 2020, at 6:38 pm, Julia Kreger wrote: > For me, it is less about aggregate usage of the respondent users, and > more about collecting the actual utilization statistics for running > deployments so we can make informed decisions. For example, If we know > there is huge IPMI console usage, then we can know how to prioritize > the same for redfish, or potentially not. > > I do also like the open-ended question response nature, but I've not > found the data very valuable except to help tailor some of the outward > facing communications for purposes of project updates. Largely because > many users are not "zoomed in" at the level most contributors or even > where everyday individual project users are. They are zoomed out > looking at the whole. > > If OSF is willing for us to have two questions, I'm all for making use > of it. I always thought we would only be permitted one. > > -Julia > On Mon, Aug 10, 2020 at 2:06 PM Kanevsky, Arkady > wrote: > > > > Do we know what % of current deployments use Ironic? > > > > I recall several years back it was 25%. But do not recall seeing latest info. > > > > Then Julia question on size and which components of Ironic are being used. > > > > Maybe if we treat “N/A” answers as they do not use Ironic it first into a single question. > > > > I do love open ended question where users can ask for improvements/extensions. > > > > Thanks, > > > > Arkady > > > > > > > > From: Allison Price > > Sent: Monday, August 10, 2020 12:31 PM > > To: Ruby Loo > > Cc: Julia Kreger; openstack-discuss > > Subject: Re: [Ironic] User Survey question > > > > > > > > [EXTERNAL EMAIL] > > > > Coming solely from the User Survey POV, each project and SIG is allowed up to 2 questions. We create that limit to ensure that the survey does not get too terribly long. > > > > > > > > If the Ironic team would like to add one question, we can. > > > > > > > > Thanks! > > > > Allison > > > > > > > > > > > > > > > > On Aug 10, 2020, at 12:28 PM, Ruby Loo wrote: > > > > > > > > Hi Julia, > > > > > > > > Please remind me, are we allowed one question? > > > > > > > > I was wondering what prevents us from having this tool and then announcing/asking folks to provide the information. Or is the idea that if no one says 'yes', it would be a waste of time to provide such a tool? My concern is that if this is the only question we are allowed to ask, we might not get that much useful information. > > > > > > > > What about pain-points wrt ironic? Could we ask that? > > > > > > > > --ruby > > > > > > > > On Mon, Aug 10, 2020 at 12:23 PM Julia Kreger wrote: > > > > Greetings awesome Artificial Intelligences and fellow humanoid carbon units! > > > > This week I need to submit the question for the 2021 user survey. We > > discussed this some during our weekly IRC meeting today.[0] > > > > Presently, the question is: > > > > "Ironic: What would you find most useful if it was part of ironic?" > > > > I'd like to propose we collect more data in order to enable us to make > > informed decisions for features and maintenance work moving forward. > > While this is long term thinking, I'm wondering if operators would be > > interested in collecting and submitting some basic data or using a > > tool, to submit anonymous usage data so we can gain insight into > > hardware types in use, numbers of machines, which interfaces are used, > > etc. > > > > So I'm thinking something along the lines of: > > > > "Ironic: Would you be willing to submit anonymous usage statistics > > (Number of nodes, conductors, which drivers are in use, etc) if such a > > tool existed? Yes/No/Not Applicable" > > > > Thoughts? Feelings? Concerns? Other ideas? > > > > -Julia > > > > > > [0]: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-08-10-15.00.log.html > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Tue Aug 11 10:09:31 2020 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 11 Aug 2020 12:09:31 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Thomas Goirand wrote: > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >> Thanks, Pierre for helping with this. >> >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) >> but I am not sure if he got any response back. No response so far, but they may all be in company summer vacation. > The end of the very good maintenance of Cloudkitty matched the date when > objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? > > This is very disappointing as I've been using it for some time already, > and that I was satisfied by it (ie: it does the job...), and especially > that latest releases are able to scale correctly. > > I very much would love if Pierre Riteau was successful in taking over. > Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. Given the volunteers (Pierre, Rafael, Luis) I would support the TC using its unholy powers to add extra core reviewers to cloudkitty. If the current PTL comes back, I'm sure they will appreciate the help, and can always fix/revert things before Victoria release. -- Thierry Carrez (ttx) From thierry at openstack.org Tue Aug 11 10:13:56 2020 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 11 Aug 2020 12:13:56 +0200 Subject: [sigs][vendors] Proposal to create Hardware Vendor SIG In-Reply-To: References: <5d4928c2-8e14-82a7-c06b-dd8df4de44fb@gmx.com> Message-ID: Kanevsky, Arkady wrote: > Great idea. Long time overdue. > Great place for many out-of-tree repos. +1, great idea. -- Thierry From thierry at openstack.org Tue Aug 11 10:24:00 2020 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 11 Aug 2020 12:24:00 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> Message-ID: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> If you can reproduce it with current versions, I would suggest to file an issue on https://github.com/rabbitmq/rabbitmq-server/issues/ The behavior you describe seems to match https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the maintainers seem to think it's been fixed by a number of somewhat-related changes in 3.7.13, because nobody reported issues anymore :) Fabian Zimmermann wrote: > Hi, > > dont know if durable queues help, but should be enabled by rabbitmq > policy which (alone) doesnt seem to fix this (we have this active) > >  Fabian > > Massimo Sgaravatto > schrieb am Sa., 8. Aug. 2020, 09:36: > > We also see the issue.  When it happens stopping and restarting the > rabbit cluster usually helps. > > I thought the problem was because of a wrong setting in the > openstack services conf files: I missed these settings (that I am > now going to add): > > [oslo_messaging_rabbit] > rabbit_ha_queues = true > amqp_durable_queues = true > > Cheers, Massimo > > > On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann > wrote: > > Hi, > > we also have this issue. > > Our solution was (up to now) to delete the queues with a script > or even reset the complete cluster. > > We just upgraded rabbitmq to the latest version - without luck. > > Anyone else seeing this issue? > >  Fabian > > > > Arnaud Morin > schrieb am Do., 6. Aug. 2020, > 16:47: > > Hey all, > > I would like to ask the community about a rabbit issue we > have from time > to time. > > In our current architecture, we have a cluster of rabbits (3 > nodes) for > all our OpenStack services (mostly nova and neutron). > > When one node of this cluster is down, the cluster continue > working (we > use pause_minority strategy). > But, sometimes, the third server is not able to recover > automatically > and need a manual intervention. > After this intervention, we restart the rabbitmq-server > process, which > is then able to join the cluster back. > > At this time, the cluster looks ok, everything is fine. > BUT, nothing works. > Neutron and nova agents are not able to report back to servers. > They appear dead. > Servers seems not being able to consume messages. > The exchanges, queues, bindings seems good in rabbit. > > What we see is that removing bindings (using rabbitmqadmin > delete > binding or the web interface) and recreate them again (using > the same > routing key) brings the service back up and running. > > Doing this for all queues is really painful. Our next plan is to > automate it, but is there anyone in the community already > saw this kind > of issues? > > Our bug looks like the one described in [1]. > Someone recommands to create an Alternate Exchange. > Is there anyone already tried that? > > FYI, we are running rabbit 3.8.2 (with OpenStack Stein). > We had the same kind of issues using older version of rabbit. > > Thanks for your help. > > [1] > https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk > > -- > Arnaud Morin > > -- Thierry Carrez (ttx) From arnaud.morin at gmail.com Tue Aug 11 10:28:43 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Tue, 11 Aug 2020 10:28:43 +0000 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> Message-ID: <20200811102843.GS31915@sync> Thanks for those tips, I will check both values asap. About the complete reset of the cluster, this is also what we use to do, but this has some downside, such as the need to restart all agents, services, etc) Cheers, -- Arnaud Morin On 08.08.20 - 15:06, Fabian Zimmermann wrote: > Hi, > > dont know if durable queues help, but should be enabled by rabbitmq policy > which (alone) doesnt seem to fix this (we have this active) > > Fabian > > Massimo Sgaravatto schrieb am Sa., 8. Aug. > 2020, 09:36: > > > We also see the issue. When it happens stopping and restarting the rabbit > > cluster usually helps. > > > > I thought the problem was because of a wrong setting in the openstack > > services conf files: I missed these settings (that I am now going to add): > > > > [oslo_messaging_rabbit] > > rabbit_ha_queues = true > > amqp_durable_queues = true > > > > Cheers, Massimo > > > > > > On Sat, Aug 8, 2020 at 6:34 AM Fabian Zimmermann > > wrote: > > > >> Hi, > >> > >> we also have this issue. > >> > >> Our solution was (up to now) to delete the queues with a script or even > >> reset the complete cluster. > >> > >> We just upgraded rabbitmq to the latest version - without luck. > >> > >> Anyone else seeing this issue? > >> > >> Fabian > >> > >> > >> > >> Arnaud Morin schrieb am Do., 6. Aug. 2020, > >> 16:47: > >> > >>> Hey all, > >>> > >>> I would like to ask the community about a rabbit issue we have from time > >>> to time. > >>> > >>> In our current architecture, we have a cluster of rabbits (3 nodes) for > >>> all our OpenStack services (mostly nova and neutron). > >>> > >>> When one node of this cluster is down, the cluster continue working (we > >>> use pause_minority strategy). > >>> But, sometimes, the third server is not able to recover automatically > >>> and need a manual intervention. > >>> After this intervention, we restart the rabbitmq-server process, which > >>> is then able to join the cluster back. > >>> > >>> At this time, the cluster looks ok, everything is fine. > >>> BUT, nothing works. > >>> Neutron and nova agents are not able to report back to servers. > >>> They appear dead. > >>> Servers seems not being able to consume messages. > >>> The exchanges, queues, bindings seems good in rabbit. > >>> > >>> What we see is that removing bindings (using rabbitmqadmin delete > >>> binding or the web interface) and recreate them again (using the same > >>> routing key) brings the service back up and running. > >>> > >>> Doing this for all queues is really painful. Our next plan is to > >>> automate it, but is there anyone in the community already saw this kind > >>> of issues? > >>> > >>> Our bug looks like the one described in [1]. > >>> Someone recommands to create an Alternate Exchange. > >>> Is there anyone already tried that? > >>> > >>> FYI, we are running rabbit 3.8.2 (with OpenStack Stein). > >>> We had the same kind of issues using older version of rabbit. > >>> > >>> Thanks for your help. > >>> > >>> [1] > >>> https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk > >>> > >>> -- > >>> Arnaud Morin > >>> > >>> > >>> From arnaud.morin at gmail.com Tue Aug 11 10:33:15 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Tue, 11 Aug 2020 10:33:15 +0000 Subject: [largescale-sig] Next meeting: August 12, 16utc In-Reply-To: <6e7a4e43-08f4-3030-2eb0-9311f27d9647@openstack.org> References: <6e7a4e43-08f4-3030-2eb0-9311f27d9647@openstack.org> Message-ID: <20200811103315.GT31915@sync> Hi Thierry and all, Thank you for bringing that up. I am off this week and will not be able to attend. Also, my TODO is still on TODO :( Cheers -- Arnaud Morin On 10.08.20 - 12:01, Thierry Carrez wrote: > Hi everyone, > > In order to accommodate US members, the Large Scale SIG recently decided to > rotate between an EU-APAC-friendly time and an US-EU-friendly time. > > Our next meeting will be the first US-EU meeting, on Wednesday, August 12 at > 16 UTC[1] in the #openstack-meeting-3 channel on IRC: > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200812T16 > > Feel free to add topics to our agenda at: > > https://etherpad.openstack.org/p/large-scale-sig-meeting > > A reminder of the TODOs we had from last meeting, in case you have time to > make progress on them: > > - amorin to add some meat to the wiki page before we push the Nova doc patch > further > - all to describe briefly how you solved metrics/billing in your deployment > in https://etherpad.openstack.org/p/large-scale-sig-documentation > > Talk to you all on Wednesday, > > -- > Thierry Carrez > From radoslaw.piliszek at gmail.com Tue Aug 11 11:05:24 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 11 Aug 2020 13:05:24 +0200 Subject: [all] [dev&ops] Early warning about new "stable" ansible releases Message-ID: Hiya Folks, Ansible 2.8.14 and 2.9.12 change the default mode, that created files will get, from 0666 (with umask; which would usually produce 0644) to 0600. [1] Kolla-Ansible got hit by it, and Zuul relies on Ansible so might pick it up at some point, possibly causing some little havoc for all of us. [1] https://github.com/ansible/ansible/issues/71200 -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Tue Aug 11 11:11:02 2020 From: kotobi at dkrz.de (Amjad Kotobi) Date: Tue, 11 Aug 2020 13:11:02 +0200 Subject: [horizon][dashboard] Disable admin and identity dashboard panel for user role Message-ID: Hi, I’m trying to customise view level of dashboard to users with “User” role in keystone, by that I meant to disable “admin” + “identity” panels for users, but when I’m adding “DISABLED = True” to admin panel, it will disable panel for admin and user roles. Is there any way to disable “admin” & “identity” panels only for user role? Installed openstack-dashboard openstack-dashboard-16.2.0-1.el7 Thanks Amjad -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5223 bytes Desc: not available URL: From moreira.belmiro.email.lists at gmail.com Tue Aug 11 12:22:40 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 11 Aug 2020 14:22:40 +0200 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: Message-ID: Hi Radosław, no, it's not true for every project. There are projects that have completely migrated to OSC (for example, Keystone). Other projects still have discrepancies (for example, Nova, Glance). Belmiro On Mon, Aug 10, 2020 at 10:26 AM Radosław Piliszek < radoslaw.piliszek at gmail.com> wrote: > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Hi, >> during the last PTG the TC discussed the problem of supporting different >> clients (OpenStack Client - OSC vs python-*clients) [1]. >> Currently, we don't have feature parity between the OSC and the >> python-*clients. >> > > Is it true of any client? I guess some are just OSC plugins 100%. > Do we know which clients have this disparity? > Personally, I encountered this with Glance the most and Cinder to some > extent (but I believe over the course of action Cinder got all features I > wanted from it in the OSC). > > -yoctozepto > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Mon Aug 10 22:01:06 2020 From: thomas.king at gmail.com (Thomas King) Date: Mon, 10 Aug 2020 16:01:06 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: The node will PXE boot, but having the provisioning network separate from the control plane network, and having a specific route back to the remote subnet causes a LOT of issues. With the specific route, the remote node will PXE boot but not talk to the ironic API service on the controller node. Without the specific route, the remote node can talk to the ironic API but cannot PXE boot off the provisioning network. Unless I add a bunch of network namespace stuff, the simple answer is to move *everything* onto the control plane. The docs dissuade against this, however, apparently for security reasons. Moving everything onto the control plane network seems to be the obvious but less desirable choice. Tom King On Tue, Aug 4, 2020 at 4:22 PM Thomas King wrote: > Getting closer. I was able to create the segment and the subnet for the > remote network on that segment. > > When I attempted to provide the baremetal node, Neutron is unable to > create/attach a port to the remote node: > WARNING ironic.common.neutron [req-b3f373fc-e76a-4c13-9ebb-41cfc682d31b > 4946f15716c04f8585d013e364802c6c 1664a38fc668432ca6bee9189be142d9 - default > default] The local_link_connection is required for 'neutron' network > interface and is not present in the nodes > 3ed87e51-00c5-4b27-95c0-665c8337e49b port > ccc335c6-3521-48a5-927d-d7ee13f7f05b > > I changed its network interface from neutron back to flat and it went past > this. I'm now waiting to see if the node will PXE boot. > > On Tue, Aug 4, 2020 at 1:41 PM Thomas King wrote: > >> Changing the ml2 flat_networks from specific physical networks to a >> wildcard allowed me to create a segment. I may be unstuck. >> >> New config: >> [ml2_type_flat] >> flat_networks=* >> >> Now to try creating the subnet and try a remote provision. >> >> Tom King >> >> On Mon, Aug 3, 2020 at 3:58 PM Thomas King wrote: >> >>> I've been using named physical networks so long, I completely forgot >>> using wildcards! >>> >>> Is this the answer???? >>> >>> https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html#modular-layer-2-ml2-flat-type-configuration-options >>> >>> Tom King >>> >>> On Tue, Jul 28, 2020 at 3:46 PM Thomas King >>> wrote: >>> >>>> Ruslanas has been a tremendous help. To catch up the discussion lists... >>>> 1. I enabled Neutron segments. >>>> 2. I renamed the existing segments for each network so they'll make >>>> sense. >>>> 3. I attempted to create a segment for a remote subnet (it is using >>>> DHCP relay) and this was the error that is blocking me. This is where the >>>> docs do not cover: >>>> [root at sea-maas-controller ~(keystone_admin)]# openstack network >>>> segment create --physical-network remote146-30-32 --network-type flat >>>> --network baremetal seg-remote-146-30-32 >>>> BadRequestException: 400: Client Error for url: >>>> http://10.146.30.65:9696/v2.0/segments, Invalid input for operation: >>>> physical_network 'remote146-30-32' unknown for flat provider network. >>>> >>>> I've asked Ruslanas to clarify how their physical networks correspond >>>> to their remote networks. They have a single provider network and multiple >>>> segments tied to multiple physical networks. >>>> >>>> However, if anyone can shine some light on this, I would greatly >>>> appreciate it. How should neutron's configurations accommodate remote >>>> networks<->Neutron segments when I have only one physical network >>>> attachment for provisioning? >>>> >>>> Thanks! >>>> Tom King >>>> >>>> On Wed, Jul 15, 2020 at 3:33 PM Thomas King >>>> wrote: >>>> >>>>> That helps a lot, thank you! >>>>> >>>>> "I use only one network..." >>>>> This bit seems to go completely against the Neutron segments >>>>> documentation. When you have access, please let me know if Triple-O is >>>>> using segments or some other method. >>>>> >>>>> I greatly appreciate this, this is a tremendous help. >>>>> >>>>> Tom King >>>>> >>>>> On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis >>>>> wrote: >>>>> >>>>>> Hi Thomas, >>>>>> >>>>>> I have a bit complicated setup from tripleo side :) I use only one >>>>>> network (only ControlPlane). thanks to Harold, he helped to make it work >>>>>> for me. >>>>>> >>>>>> Yes, as written in the tripleo docs for leaf networks, it use the >>>>>> same neutron network, different subnets. so neutron network is ctlplane (I >>>>>> think) and have ctlplane-subnet, remote-provision and remote-KI :)) that >>>>>> generates additional lines in "ip r s" output for routing "foreign" subnets >>>>>> through correct gw, if you would have isolated networks, by vlans and ports >>>>>> this would apply for each subnet different gw... I believe you >>>>>> know/understand that part. >>>>>> >>>>>> remote* subnets have dhcp-relay setup by network team... do not ask >>>>>> details for that. I do not know how to, but can ask :) >>>>>> >>>>>> >>>>>> in undercloud/tripleo i have 2 dhcp servers, one is for >>>>>> introspection, another for provide/cleanup and deployment process. >>>>>> >>>>>> all of those subnets have organization level tagged networks and are >>>>>> tagged on network devices, but they are untagged on provisioning >>>>>> interfaces/ports, as in general pxe should be untagged, but some nic's can >>>>>> do vlan untag on nic/bios level. but who cares!? >>>>>> >>>>>> I just did a brief check on your first post, I think I have simmilar >>>>>> setup to yours :)) I will check in around 12hours :)) more deaply, as will >>>>>> be at work :))) >>>>>> >>>>>> >>>>>> P.S. sorry for wrong terms, I am bad at naming. >>>>>> >>>>>> >>>>>> On Wed, 15 Jul 2020, 21:13 Thomas King, >>>>>> wrote: >>>>>> >>>>>>> Ruslanas, that would be excellent! >>>>>>> >>>>>>> I will reply to you directly for details later unless the maillist >>>>>>> would like the full thread. >>>>>>> >>>>>>> Some preliminary questions: >>>>>>> >>>>>>> - Do you have a separate physical interface for the segment(s) >>>>>>> used for your remote subnets? >>>>>>> The docs state each segment must have a unique physical network >>>>>>> name, which suggests a separate physical interface for each segment unless >>>>>>> I'm misunderstanding something. >>>>>>> - Are your provisioning segments all on the same Neutron >>>>>>> network? >>>>>>> - Are you using tagged switchports or access switchports to your >>>>>>> Ironic server(s)? >>>>>>> >>>>>>> Thanks, >>>>>>> Tom King >>>>>>> >>>>>>> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis < >>>>>>> ruslanas at lpic.lt> wrote: >>>>>>> >>>>>>>> I have deployed that with tripleO, but now we are recabling and >>>>>>>> redeploying it. So once I have it running I can share my configs, just name >>>>>>>> which you want :) >>>>>>>> >>>>>>>> On Tue, 14 Jul 2020 at 18:40, Thomas King >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I have. That's the Triple-O docs and they don't go through the >>>>>>>>> normal .conf files to explain how it works outside of Triple-O. It has some >>>>>>>>> ideas but no running configurations. >>>>>>>>> >>>>>>>>> Tom King >>>>>>>>> >>>>>>>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis < >>>>>>>>> ruslanas at lpic.lt> wrote: >>>>>>>>> >>>>>>>>>> hi, have you checked: >>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>>>>>>> ? >>>>>>>>>> I am following this link. I only have one network, having >>>>>>>>>> different issues tho ;) >>>>>>>>>> >>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knikolla at bu.edu Tue Aug 11 15:22:58 2020 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 11 Aug 2020 15:22:58 +0000 Subject: [keystone] Weekly meeting cancelled today Message-ID: Hi all, There are no items in today's weekly meeting agenda, and I'm unavailable to host/attend it due to a scheduling conflict. Therefore we can go ahead and cancel today's meeting. Thank you, and sorry for any inconvenience Kristi Nikolla From openstack at nemebean.com Tue Aug 11 20:20:43 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 11 Aug 2020 15:20:43 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <671fec63-8bea-4215-c773-d8360e368a99@sap.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> Message-ID: On 7/28/20 3:02 AM, Johannes Kulik wrote: > Hi, > > On 7/27/20 7:08 PM, Dan Smith wrote: >> >> The primary concern was about something other than nova sitting on our >> bus making calls to our internal services. I imagine that the proposal >> to bake it into oslo.messaging is for the same purpose, and I'd probably >> have the same concern. At the time I think we agreed that if we were >> going to support direct-to-service health checks, they should be teensy >> HTTP servers with oslo healthchecks middleware. Further loading down >> rabbit with those pings doesn't seem like the best plan to >> me. Especially since Nova (compute) services already check in over RPC >> periodically and the success of that is discoverable en masse through >> the API. >> >> --Dan >> > > While I get this concern, we have seen the problem described by the > original poster in production multiple times: nova-compute reports to be > healthy, is seen as up through the API, but doesn't work on any messages > anymore. > A health-check going through rabbitmq would really help spotting those > situations, while having an additional HTTP server doesn't. I wonder if this does help though. It seems like a bug that a nova-compute service would stop processing messages and still be seen as up in the service status. Do we understand why that is happening? If not, I'm unclear that a ping living at the oslo.messaging layer is going to do a better job of exposing such an outage. The fact that oslo.messaging is responding does not necessarily equate to nova-compute functioning as expected. To be clear, this is not me nacking the ping feature. I just want to make sure we understand what is going on here so we don't add another unreliable healthchecking mechanism to the one we already have. > > Have a nice day, > Johannes > From openstack at nemebean.com Tue Aug 11 20:28:05 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 11 Aug 2020 15:28:05 -0500 Subject: [largescale-sig] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> Message-ID: On 8/3/20 9:21 AM, Mohammed Naser wrote: > 3. You mentioned you're moving towards Kubernetes, we're doing the > same and building an operator: > https://opendev.org/vexxhost/openstack-operator -- Because the > operator manages the whole thing and Kubernetes does it's thing too, > we started moving towards 1 (single) rabbitmq per service, which > reaaaaaaally helped a lot in stabilizing things. Oslo messaging is a > lot better at recovering when a single service IP is pointing towards > it because it doesn't do weird things like have threads trying to > connect to other Rabbit ports. Just a thought. On a related note, LINE actually broke it down even further than that. There are details of their design in [0], but essentially they have downstream changes where they can specify a transport per notification topic to further separate out rabbit traffic. The spec hasn't been implemented yet upstream, but I thought I'd mention it since it seems relevant to this discussion. 0: https://specs.openstack.org/openstack/oslo-specs/specs/victoria/support-transports-per-oslo-notifications.html From smooney at redhat.com Tue Aug 11 21:20:07 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Aug 2020 22:20:07 +0100 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> Message-ID: <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: > > On 7/28/20 3:02 AM, Johannes Kulik wrote: > > Hi, > > > > On 7/27/20 7:08 PM, Dan Smith wrote: > > > > > > The primary concern was about something other than nova sitting on our > > > bus making calls to our internal services. I imagine that the proposal > > > to bake it into oslo.messaging is for the same purpose, and I'd probably > > > have the same concern. At the time I think we agreed that if we were > > > going to support direct-to-service health checks, they should be teensy > > > HTTP servers with oslo healthchecks middleware. Further loading down > > > rabbit with those pings doesn't seem like the best plan to > > > me. Especially since Nova (compute) services already check in over RPC > > > periodically and the success of that is discoverable en masse through > > > the API. > > > > > > --Dan > > > > > > > While I get this concern, we have seen the problem described by the > > original poster in production multiple times: nova-compute reports to be > > healthy, is seen as up through the API, but doesn't work on any messages > > anymore. > > A health-check going through rabbitmq would really help spotting those > > situations, while having an additional HTTP server doesn't. > > I wonder if this does help though. It seems like a bug that a > nova-compute service would stop processing messages and still be seen as > up in the service status. it kind of is a bug this one to be precise https://bugs.launchpad.net/nova/+bug/1854992 > Do we understand why that is happening? assuming it is https://bugs.launchpad.net/nova/+bug/1854992 then then the reason the compute status is still up is the compute service is runing fine and sending heartbeats, the issue is that under certin failure modes the topic queue used to recivie rpc topic sends can disappear. one way this can happen is if the rabbitmq server restart, in which case the resend code in oslo will reconnect to the exchange but it will not nessisarly recreate the topic queue. > If > not, I'm unclear that a ping living at the oslo.messaging layer is going > to do a better job of exposing such an outage. The fact that > oslo.messaging is responding does not necessarily equate to nova-compute > functioning as expected. maybe saying that a little clear. https://bugs.launchpad.net/nova/+bug/1854992 has other causes beyond the rabbit mq server crahsing but the underlying effect is the same the queue that the compute service uses to recive rpc call destroyed and not recreated. a related oslo bug https://bugs.launchpad.net/oslo.messaging/+bug/1661510 was "fixed" by add the mandatory transport flag feature. (you can porably mark that as fixed releaed by the way) from a nova persepctive the intened way to fix the nova bug was to use the new mandartroy flag and catch the MessageUndeliverable and have the conductor/api recreate the compute services topic queue and resent the amqp message. An open question is will the compute service detact that and start processing the queue again. if that will not fix the problem plan b was to add a self ping to the compute service wehere the compute service, on a long timeout (once an hour may once every 15 mins at the most), would try to send a message to its own recive queue. if it got the MessageUndeliverable excption then the comptue service woudl recreate its own queue. addint an interservice ping or triggering the ping enternally is unlikely to help with the nova bug. ideally we would prefer to have the conductor/api recreate the queue and re send the message if it detect the queue is missing rather then have a self ping as that does not add addtional load to the message bus and only recreates the queue if its needed. im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug that is motiviting the creation of this oslo ping feature but that feels premature if it is. i think it would be better try to adress this by the sender recreating the queue if the deliver fails and if that is not viable then protpyope thge fix in nova. if the self ping fixes this miss queue error then we could extract the cod into oslo. > > To be clear, this is not me nacking the ping feature. I just want to > make sure we understand what is going on here so we don't add another > unreliable healthchecking mechanism to the one we already have. > > > > > Have a nice day, > > Johannes > > > > From melwittt at gmail.com Tue Aug 11 21:53:07 2020 From: melwittt at gmail.com (melanie witt) Date: Tue, 11 Aug 2020 14:53:07 -0700 Subject: [gate][keystone] *-grenade-multinode jobs failing with UnicodeDecodeError in keystone Message-ID: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> Howdy all, FYI the *-grenade-multinode gate jobs are currently failing with the following error in keystone: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 3: invalid start byte This appears to be an issue with a new default data format in msgpack v1.0 [1] which was brought in by a recent bump of upper constraints [2]. *-grenade-multinode jobs are affected because they test a rolling upgrade where the controller is upgraded to the N release version but one compute node is on the N-1 release version. It looks like cached keystone tokens being used by the N-1 node are erroring out during msgpack unpacking because they are in the old data format and msgpack v1.0 has a new default data format. I've opened a bug [3] about and I'm trying out the following keystone patch to fix it: https://review.opendev.org/745752 Reviews appreciated. If this is not the best approach or if this affects other projects as well, alternatively we could revert the upper constraint bump to msgpack v1.0 while we figure out the best fix. Cheers, -melanie [1] https://github.com/msgpack/msgpack-python/blob/v1.0.0/README.md#major-breaking-changes-in-msgpack-10 [2] https://review.opendev.org/#/c/745437/2/upper-constraints.txt at 373 [3] https://launchpad.net/bugs/1891244 From mike.carden at gmail.com Tue Aug 11 22:00:44 2020 From: mike.carden at gmail.com (Mike Carden) Date: Wed, 12 Aug 2020 08:00:44 +1000 Subject: [horizon][dashboard] Disable admin and identity dashboard panel for user role In-Reply-To: References: Message-ID: Hi Amjad. You don't say what version of OpenStack you are running, but I thought I would just mention that in Queens at least, the Identity tab in Horizon is essential for users if they belong to more than ~20 Projects because the Project drop-down in the GUI won't display them all and the user needs the Identity tab to select any projects not shown in the drop-down. This may not apply to you, but I think it's worth being aware of. -- MC -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Aug 12 02:57:02 2020 From: melwittt at gmail.com (melanie witt) Date: Tue, 11 Aug 2020 19:57:02 -0700 Subject: [gate][keystone] *-grenade-multinode jobs failing with UnicodeDecodeError in keystone In-Reply-To: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> References: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> Message-ID: <67a115ba-f80a-ebe5-8689-922e3bbb9a40@gmail.com> On 8/11/20 14:53, melanie witt wrote: > Howdy all, > > FYI the *-grenade-multinode gate jobs are currently failing with the following error in keystone: > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 3: invalid start byte > > This appears to be an issue with a new default data format in msgpack v1.0 [1] which was brought in by a recent bump of upper constraints [2]. > > *-grenade-multinode jobs are affected because they test a rolling upgrade where the controller is upgraded to the N release version but one compute node is on the N-1 release version. It looks like cached keystone tokens being used by the N-1 node are erroring out during msgpack unpacking because they are in the old data format and msgpack v1.0 has a new default data format. > > I've opened a bug [3] about and I'm trying out the following keystone patch to fix it: > > https://review.opendev.org/745752 > > Reviews appreciated. > > If this is not the best approach or if this affects other projects as well, alternatively we could revert the upper constraint bump to msgpack v1.0 while we figure out the best fix. Here's a patch for reverting the upper constraint for msgpack in case that approach is preferred: https://review.opendev.org/745769 > [1] https://github.com/msgpack/msgpack-python/blob/v1.0.0/README.md#major-breaking-changes-in-msgpack-10 > [2] https://review.opendev.org/#/c/745437/2/upper-constraints.txt at 373 > [3] https://launchpad.net/bugs/1891244 > From dev.faz at gmail.com Wed Aug 12 04:44:03 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 12 Aug 2020 06:44:03 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <20200806140421.GN31915@sync> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <88c24f3a-7d29-aa39-ed12-803279cc90c1@openstack.org> <20200806140421.GN31915@sync> Message-ID: Hi, would be great if you could share your script. Fabian Arnaud Morin schrieb am Do., 6. Aug. 2020, 16:11: > Hey all, > > Thanks for your replies. > About the fact that nova already implement this, I will try again on my > side, but maybe it was not yet implemented in newton (I only tried nova > on newton version). Thank you for bringing that to me. > > About the healhcheck already done on nova side (and also on neutron). > As far as I understand, it's done using a specific rabbit queue, which > can work while others queues are not working. > The purpose of adding ping endpoint here is to be able to ping in all > topics, not only those used for healthcheck reports. > > Also, as mentionned by Thierry, what we need is a way to externally > do pings toward neutron agents and nova computes. > The patch itself is not going to add any load on rabbit. It really > depends on the way the operator will use it. > On my side, I built a small external oslo.messaging script which I can > use to do such pings. > > Cheers, > > -- > Arnaud Morin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Aug 12 07:49:10 2020 From: melwittt at gmail.com (melanie witt) Date: Wed, 12 Aug 2020 00:49:10 -0700 Subject: [gate][keystone][nova][neutron] *-grenade-multinode jobs failing with UnicodeDecodeError in keystone In-Reply-To: <67a115ba-f80a-ebe5-8689-922e3bbb9a40@gmail.com> References: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> <67a115ba-f80a-ebe5-8689-922e3bbb9a40@gmail.com> Message-ID: <18572a52-d105-9219-6b19-5fe23f18e3e0@gmail.com> Adding [nova][neutron] since their gates will continue to be blocked until one of the following proposed fixes merges. They are linked inline. On 8/11/20 19:57, melanie witt wrote: > On 8/11/20 14:53, melanie witt wrote: >> Howdy all, >> >> FYI the *-grenade-multinode gate jobs are currently failing with the >> following error in keystone: >> >>    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in >> position 3: invalid start byte >> >> This appears to be an issue with a new default data format in msgpack >> v1.0 [1] which was brought in by a recent bump of upper constraints [2]. >> >> *-grenade-multinode jobs are affected because they test a rolling >> upgrade where the controller is upgraded to the N release version but >> one compute node is on the N-1 release version. It looks like cached >> keystone tokens being used by the N-1 node are erroring out during >> msgpack unpacking because they are in the old data format and msgpack >> v1.0 has a new default data format. >> >> I've opened a bug [3] about and I'm trying out the following keystone >> patch to fix it: >> >> https://review.opendev.org/745752 >> >> Reviews appreciated. I tested ^ with a DNM patch to nova and nova-grenade-multinode passes with it. >> If this is not the best approach or if this affects other projects as >> well, alternatively we could revert the upper constraint bump to >> msgpack v1.0 while we figure out the best fix. > > Here's a patch for reverting the upper constraint for msgpack in case > that approach is preferred: > > https://review.opendev.org/745769 And this reqs pin ^ is also available if the reviewers find the keystone patch unsuitable. >> [1] >> https://github.com/msgpack/msgpack-python/blob/v1.0.0/README.md#major-breaking-changes-in-msgpack-10 >> >> [2] https://review.opendev.org/#/c/745437/2/upper-constraints.txt at 373 >> [3] https://launchpad.net/bugs/1891244 >> > From zhangbailin at inspur.com Wed Aug 12 08:14:37 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Wed, 12 Aug 2020 08:14:37 +0000 Subject: [oslo.cache][keystonemiddleware] enable-sasl-protocol Message-ID: <35f020916eb54189a6b4176deb3a2a48@inspur.com> Hi all, we would like to enable sasl protocol to oslo.cache and keystonemiddleware project to improve the security of authority. SASL(Simple Authentication and Security Layer): is a memchanism used to extend the verification ability of C/S mode. SASL is only the authentication process, which integrates the application layer and the system authentication mechanism. Need to review patches: https://review.opendev.org/#/q/status:open++branch:master+topic:bp/enable-sasl-protocol brinzhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Wed Aug 12 09:20:32 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 12 Aug 2020 11:20:32 +0200 Subject: [gate][keystone][nova][neutron] *-grenade-multinode jobs failing with UnicodeDecodeError in keystone In-Reply-To: <18572a52-d105-9219-6b19-5fe23f18e3e0@gmail.com> References: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> <67a115ba-f80a-ebe5-8689-922e3bbb9a40@gmail.com> <18572a52-d105-9219-6b19-5fe23f18e3e0@gmail.com> Message-ID: <20200812092032.jcwjmy4yci6rjbzd@skaplons-mac> Hi, Thx Melanie for the proposed fix for this issue. On Wed, Aug 12, 2020 at 12:49:10AM -0700, melanie witt wrote: > Adding [nova][neutron] since their gates will continue to be blocked until > one of the following proposed fixes merges. They are linked inline. > > On 8/11/20 19:57, melanie witt wrote: > > On 8/11/20 14:53, melanie witt wrote: > > > Howdy all, > > > > > > FYI the *-grenade-multinode gate jobs are currently failing with the > > > following error in keystone: > > > > > >    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in > > > position 3: invalid start byte > > > > > > This appears to be an issue with a new default data format in > > > msgpack v1.0 [1] which was brought in by a recent bump of upper > > > constraints [2]. > > > > > > *-grenade-multinode jobs are affected because they test a rolling > > > upgrade where the controller is upgraded to the N release version > > > but one compute node is on the N-1 release version. It looks like > > > cached keystone tokens being used by the N-1 node are erroring out > > > during msgpack unpacking because they are in the old data format and > > > msgpack v1.0 has a new default data format. > > > > > > I've opened a bug [3] about and I'm trying out the following > > > keystone patch to fix it: > > > > > > https://review.opendev.org/745752 > > > > > > Reviews appreciated. > > I tested ^ with a DNM patch to nova and nova-grenade-multinode passes with > it. > > > > If this is not the best approach or if this affects other projects > > > as well, alternatively we could revert the upper constraint bump to > > > msgpack v1.0 while we figure out the best fix. > > > > Here's a patch for reverting the upper constraint for msgpack in case > > that approach is preferred: > > > > https://review.opendev.org/745769 > > And this reqs pin ^ is also available if the reviewers find the keystone > patch unsuitable. > > > > [1] https://github.com/msgpack/msgpack-python/blob/v1.0.0/README.md#major-breaking-changes-in-msgpack-10 > > > > > > [2] https://review.opendev.org/#/c/745437/2/upper-constraints.txt at 373 > > > [3] https://launchpad.net/bugs/1891244 > > > > > > > -- Slawek Kaplonski Senior software engineer Red Hat From dev.faz at gmail.com Wed Aug 12 10:14:31 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 12 Aug 2020 12:14:31 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Hi, just wrote some small scripts to reproduce our issue and send a msg to rabbitmq-list. https://groups.google.com/d/msg/rabbitmq-users/eC8jc-YEt8s/s8K_0KnXDQAJ Fabian Am Di., 11. Aug. 2020 um 12:31 Uhr schrieb Thierry Carrez < thierry at openstack.org>: > If you can reproduce it with current versions, I would suggest to file > an issue on https://github.com/rabbitmq/rabbitmq-server/issues/ > > The behavior you describe seems to match > https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the > maintainers seem to think it's been fixed by a number of > somewhat-related changes in 3.7.13, because nobody reported issues > anymore :) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Aug 12 10:32:27 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 12 Aug 2020 12:32:27 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> Message-ID: <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> Sean Mooney wrote: > On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: >> I wonder if this does help though. It seems like a bug that a nova-compute service would stop processing messages and still be seen as up in the service status. Do we understand why that is happening? If not, I'm unclear that a ping living at the oslo.messaging layer is going to do a better job of exposing such an outage. The fact that oslo.messaging is responding does not necessarily equate to nova-compute functioning as expected. >> >> To be clear, this is not me nacking the ping feature. I just want to make sure we understand what is going on here so we don't add another unreliable healthchecking mechanism to the one we already have. > [...] > im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug that is motiviting the creation of this oslo ping > feature but that feels premature if it is. i think it would be better try to adress this by the sender recreating the > queue if the deliver fails and if that is not viable then protpyope thge fix in nova. if the self ping fixes this > miss queue error then we could extract the cod into oslo. I think this is missing the point... This is not about working around a specific bug, it's about adding a way to detect a certain class of failure. It's more of an operational feature than a development bugfix. If I understood correctly, OVH is running that patch in production as a way to detect certain problems they regularly run into, something our existing monitor mechanisms fail to detect. That sounds like a worthwhile addition? Alternatively, if we can monitor the exact same class of failures using our existing systems (or by improving them rather than adding a new door), that works too. -- Thierry Carrez (ttx) From moguimar at redhat.com Wed Aug 12 10:52:11 2020 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Wed, 12 Aug 2020 12:52:11 +0200 Subject: [oslo.cache][keystonemiddleware] enable-sasl-protocol In-Reply-To: <35f020916eb54189a6b4176deb3a2a48@inspur.com> References: <35f020916eb54189a6b4176deb3a2a48@inspur.com> Message-ID: Hi Brin, Thanks for the patches! I've dropped a few reviews already. Feel free to reach me also on #openstack-oslo if you have any questions. moguimar On Wed, Aug 12, 2020 at 10:16 AM Brin Zhang(张百林) wrote: > Hi all, we would like to enable sasl protocol to oslo.cache and > keystonemiddleware > > project to improve the security of authority. > > SASL(Simple Authentication and Security Layer): is a memchanism used to > extend the verification ability of C/S mode. SASL is only the > authentication process, which integrates the application layer and the > system authentication mechanism. > > > > Need to review patches: > https://review.opendev.org/#/q/status:open++branch:master+topic:bp/enable-sasl-protocol > > > > > > > > brinzhang > > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Aug 12 11:05:28 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 12 Aug 2020 12:05:28 +0100 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> Message-ID: <4fc5e9d57172a73608e8fdf7e70ff569dca5dfd4.camel@redhat.com> On Wed, 2020-08-12 at 12:32 +0200, Thierry Carrez wrote: > Sean Mooney wrote: > > On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: > > > I wonder if this does help though. It seems like a bug that a nova-compute service would stop processing messages > > > and still be seen as up in the service status. Do we understand why that is happening? If not, I'm unclear that a > > > ping living at the oslo.messaging layer is going to do a better job of exposing such an outage. The fact that > > > oslo.messaging is responding does not necessarily equate to nova-compute functioning as expected. > > > > > > To be clear, this is not me nacking the ping feature. I just want to make sure we understand what is going on here > > > so we don't add another unreliable healthchecking mechanism to the one we already have. > > > > [...] > > im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug that is motiviting the creation of this oslo > > ping > > feature but that feels premature if it is. i think it would be better try to adress this by the sender recreating > > the > > queue if the deliver fails and if that is not viable then protpyope thge fix in nova. if the self ping fixes this > > miss queue error then we could extract the cod into oslo. > > I think this is missing the point... This is not about working around a > specific bug, it's about adding a way to detect a certain class of > failure. It's more of an operational feature than a development bugfix. right but we are concerned that there will be a negitive perfromance impact to adding it and it wont detect the one bug we are aware of of this type in a way that we could not also detect by using the mandtory flag. nova already has a heartbeat that the agents send to the conducto to report they are still alive. this ping would work in the opisite direction by reaching out to the compute node over the rpc bus. but that would only detect teh vailure mode if the pic use the topic queue and it could only fix it if recreating the queue via the conducor is a viable solution if it is using the mandataory flag and just recreating it is a better solution since we dont need to ping constantly in the background. if we get teh excpeiton we create the queue and retransmit. the compute manger does not resubscribe to the topic when the queue is recreated automaticaly then the new ping feature wont really help. we would need the comptue service or any other service that subsibse to the topic queue to try to ping its own topic queue and if that fails recreate the subsribtion/queue. as far as i am ware that is not what the fature is proposing > > If I understood correctly, OVH is running that patch in production as a > way to detect certain problems they regularly run into, something our > existing monitor mechanisms fail to detect. That sounds like a > worthwhile addition? im not sure what failure mode it will detect. if they can define that then it would help with understanding if this is worthwhile or not. > > Alternatively, if we can monitor the exact same class of failures using > our existing systems (or by improving them rather than adding a new > door), that works too. we can monitor the exitance of the queue at least form the rabbitmq api(its disable by defualt but just enable the rabbit-managment plugin) but im not sure what there current issue this is trying to solve is. > From radoslaw.piliszek at gmail.com Wed Aug 12 11:45:06 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 12 Aug 2020 13:45:06 +0200 Subject: [tc][oslo] Move etcd3gw to OpenStack? Message-ID: Hey, Folks! I see it has been kinda proposed already [1] so that's mostly why I am asking about that now. >From what I understand, etcd3gw is our best bet when trying to get coordination with etcd3. However, it has recently released a broken release [2] due to no testing (not to mention gating with tooz). I think it could benefit from OpenDev's existing tooling. And since OpenStack is an important client of it and OpenStack preferring this client, it might be wise to put it in that namespace already. I guess the details would have to be discussed with dims (current owner) himself but he seemed happy about it in [1]. I'm notifying oslo as well as this would probably live best finally under oslo governance. Please let me know if any of the above is not true, so that I can amend my knowledge. :-) [1] https://github.com/dims/etcd3-gateway/issues/29 [2] https://bugs.launchpad.net/kolla-ansible/+bug/1891314 -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwm2012 at gmail.com Wed Aug 12 13:03:17 2020 From: pwm2012 at gmail.com (pwm) Date: Wed, 12 Aug 2020 21:03:17 +0800 Subject: DNS server instead of /etc/hosts file on Infra Server In-Reply-To: References: Message-ID: Hi, Plan to use PowerDNS server instead of the /etc/hosts file for resolving. Has anyone done this before? The PowerDNS support MySQL DB backend and a frontend GUI PowerDNS Admin which allows centralized easy maintenance. Thanks On Sun, Aug 9, 2020 at 11:54 PM pwm wrote: > Hi, > Anyone interested in replacing the /etc/hosts file entry with a DNS server > on the openstack-ansible deployment? > > Thank you > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Wed Aug 12 13:23:09 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 12 Aug 2020 08:23:09 -0500 Subject: [Release-job-failures] Release of openstack/python-glanceclient for ref refs/tags/3.1.2 failed In-Reply-To: References: Message-ID: On 8/12/20 4:36 AM, zuul at openstack.org wrote: > Build failed. > > - openstack-upload-github-mirror https://zuul.opendev.org/t/openstack/build/cba5dda29e8744059d637a97f358c59f : SUCCESS in 43s > - release-openstack-python https://zuul.opendev.org/t/openstack/build/b9628cf4f28d4bea95844539295ff520 : SUCCESS in 2m 54s > - announce-release https://zuul.opendev.org/t/openstack/build/9f9a5910815247049bf02ab612781620 : FAILURE in 24m 20s > - propose-update-constraints https://zuul.opendev.org/t/openstack/build/44353ec832794af2965e7e2d05d63442 : SUCCESS in 3m 28s announce-release job appears to have failed due to a temporary network issue accessing PyPi packages. Since the release announcement is not critical, no further action is needed. Sean From jonathan.rosser at rd.bbc.co.uk Wed Aug 12 13:28:17 2020 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Wed, 12 Aug 2020 14:28:17 +0100 Subject: DNS server instead of /etc/hosts file on Infra Server In-Reply-To: References: Message-ID: <7db29753-4710-a979-fe71-67a829fa55c3@rd.bbc.co.uk> Openstack-Ansible already supports optionally using the unbound dns server instead of managing /etc/hosts. Join #openstack-ansible on IRC if you need any help. Regards, Jonathan. On 12/08/2020 14:03, pwm wrote: > Hi, > Plan to use PowerDNS server instead of the /etc/hosts file for > resolving. Has anyone done this before? > The PowerDNS support MySQL DB backend and a frontend GUI PowerDNS > Admin which allows centralized easy maintenance. > > Thanks > > On Sun, Aug 9, 2020 at 11:54 PM pwm > wrote: > > Hi, > Anyone interested in replacing the /etc/hosts file entry with a > DNS server on the openstack-ansible deployment? > > Thank you > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Aug 12 14:10:46 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Aug 2020 10:10:46 -0400 Subject: [tc] monthly meeting summary Message-ID: Hi everyone, Here’s a summary of what happened in our TC monthly meeting last Thursday, August 6. # ATTENDEES (LINES SAID) - mnaser (100) - gmann (43) - diablo_rojo (20) - jungleboyj (16) - belmoreira (10) - evrardjp (8) - fungi (6) - zaneb (4) - knikolla (4) - njohnston (3) # MEETING SUMMARY 1. Rollcall (mnaser, 14:00:21) 2. Follow up on past action items (mnaser, 14:02:16) - https://review.opendev.org/#/c/744995/ (diablo_rojo, 14:03:14) - http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016336.html (gmann, 14:03:24) 3. OpenStack User-facing APIs and CLIs (belmoreira) (mnaser, 14:25:52) 4. W cycle goal selection start (mnaser, 14:34:39) 5. Completion of retirement cleanup (gmann) (mnaser, 14:40:48) - https://etherpad.opendev.org/p/tc-retirement-cleanup (mnaser, 14:41:02) - https://review.opendev.org/#/c/739291/1 (gmann, 14:42:17) - https://review.opendev.org/#/q/topic:cleanup-retirement+(status:open+OR+status:merged) (gmann, 14:42:51) # ACTION ITEMS - TC members to follow up and review "Resolution to define distributed leadership for projects" - mnaser schedule session with sig-arch and k8s steering committee - gmann continue to audit and clean-up tags - mnaser propose change to implement weekly meetings - njohnston and mugsie to work on getting goals groomed/proposed for W cycle - belmoreira start discussion around openstack user-facing apis & clis - gmann to merge changes to properly retire projects To read the full logs of the meeting, please refer to http://eavesdrop.openstack.org/meetings/tc/2020/tc.2020-08-06-14.00.log.html Thank you, Mohammed -- Mohammed Naser VEXXHOST, Inc. From mnaser at vexxhost.com Wed Aug 12 14:22:53 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Aug 2020 10:22:53 -0400 Subject: [largescale-sig] RPC ping In-Reply-To: <20200806141132.GO31915@sync> References: <20200727095744.GK31915@sync> <20200806141132.GO31915@sync> Message-ID: On Thu, Aug 6, 2020 at 10:11 AM Arnaud Morin wrote: > > Hi Mohammed, > > 1 - That's something we would also like, but it's beyond the patch I > propose. > I need this patch not only for kubernetes, but also for monitoring my > legagy openstack agents running outside of k8s. > > 2 - Yes, latest version of rabbitmq is better on that point, but we > still see some weird issue (I will ask the community about it in another > topic). > > 3 - Thanks for this operator, we'll take a look! > By saying 1 rabbit per service, I understand 1 server, not 1 cluster, > right? > That sounds risky if you lose the server. The controllers are pretty stable and if a controller dies, Kubernetes will take care of restarting the pod somewhere else and everything will reconnect and things will be happy again. > I suppose you dont do that for the database? One database cluster per service, with 'old-school' replication because no one really does true multimaster in Galera with OpenStack anyways. > 4 - Nice, how to you monitor those consumptions? Using rabbit management > API? Prometheus RabbitMQ exporter, now migrating to the native one shipping in the new RabbitMQ releases. > Cheers, > > -- > Arnaud Morin > > On 03.08.20 - 10:21, Mohammed Naser wrote: > > I have a few operational suggestions on how I think we could do this best: > > > > 1. I think exposing a healthcheck endpoint that _actually_ runs the > > ping and responds with a 200 OK makes a lot more sense in terms of > > being able to run it inside something like Kubernetes, you end up with > > a "who makes the ping and who responds to it" type of scenario which > > can be tricky though I'm sure we can figure that out > > 2. I've found that newer releases of RabbitMQ really help with those > > un-usable queues after a split, I haven't had any issues at all with > > newer releases, so that could be something to help your life be a lot > > easier. > > 3. You mentioned you're moving towards Kubernetes, we're doing the > > same and building an operator: > > https://opendev.org/vexxhost/openstack-operator -- Because the > > operator manages the whole thing and Kubernetes does it's thing too, > > we started moving towards 1 (single) rabbitmq per service, which > > reaaaaaaally helped a lot in stabilizing things. Oslo messaging is a > > lot better at recovering when a single service IP is pointing towards > > it because it doesn't do weird things like have threads trying to > > connect to other Rabbit ports. Just a thought. > > 4. In terms of telemetry and making sure you avoid that issue, we > > track the consumption rates of queues inside OpenStack. OpenStack > > consumption rate should be constant and never growing, anytime it > > grows, we instantly detect that something is fishy. However, the > > other issue comes in that when you restart any openstack service, it > > 'forgets' all it's existing queues and then you have a set of building > > up queues until they automatically expire which happens around 30 > > minutes-ish, so it makes that alarm of "things are not being consumed" > > a little noisy if you're restarting services > > > > Sorry for the wall of super unorganized text, all over the place here > > but thought I'd chime in with my 2 cents :) > > > > On Mon, Jul 27, 2020 at 6:04 AM Arnaud Morin wrote: > > > > > > Hey all, > > > > > > TLDR: I propose a change to oslo_messaging to allow doing a ping over RPC, > > > this is useful to monitor liveness of agents. > > > > > > > > > Few weeks ago, I proposed a patch to oslo_messaging [1], which is adding a > > > ping endpoint to RPC dispatcher. > > > It means that every openstack service which is using oslo_messaging RPC > > > endpoints (almosts all OpenStack services and agents - e.g. neutron > > > server + agents, nova + computes, etc.) will then be able to answer to a > > > specific "ping" call over RPC. > > > > > > I decided to propose this patch in my company mainly for 2 reasons: > > > 1 - we are struggling monitoring our nova compute and neutron agents in a > > > correct way: > > > > > > 1.1 - sometimes our agents are disconnected from RPC, but the python process > > > is still running. > > > 1.2 - sometimes the agent is still connected, but the queue / binding on > > > rabbit cluster is not working anymore (after a rabbit split for > > > example). This one is very hard to debug, because the agent is still > > > reporting health correctly on neutron server, but it's not able to > > > receive messages anymore. > > > > > > > > > 2 - we are trying to monitor agents running in k8s pods: > > > when running a python agent (neutron l3-agent for example) in a k8s pod, we > > > wanted to find a way to monitor if it is still live of not. > > > > > > > > > Adding a RPC ping endpoint could help us solve both these issues. > > > Note that we still need an external mechanism (out of OpenStack) to do this > > > ping. > > > We also think it could be nice for other OpenStackers, and especially > > > large scale ops. > > > > > > Feel free to comment. > > > > > > > > > [1] https://review.opendev.org/#/c/735385/ > > > > > > > > > -- > > > Arnaud Morin > > > > > > > > > > > > -- > > Mohammed Naser > > VEXXHOST, Inc. -- Mohammed Naser VEXXHOST, Inc. From dev.faz at gmail.com Wed Aug 12 14:25:49 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 12 Aug 2020 16:25:49 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Hi, just could prove, that "durable queues" seem to workaround the issue. If I enable durable queues, im no longer able to reproduce my issue. Afaik durable queues have downsides - esp if a node fails and the queue is not (jet) synced. Anyone information about this? Fabian Am Mi., 12. Aug. 2020 um 12:14 Uhr schrieb Fabian Zimmermann < dev.faz at gmail.com>: > Hi, > > just wrote some small scripts to reproduce our issue and send a msg to > rabbitmq-list. > > https://groups.google.com/d/msg/rabbitmq-users/eC8jc-YEt8s/s8K_0KnXDQAJ > > Fabian > > > Am Di., 11. Aug. 2020 um 12:31 Uhr schrieb Thierry Carrez < > thierry at openstack.org>: > >> If you can reproduce it with current versions, I would suggest to file >> an issue on https://github.com/rabbitmq/rabbitmq-server/issues/ >> >> The behavior you describe seems to match >> https://github.com/rabbitmq/rabbitmq-server/issues/1873 but the >> maintainers seem to think it's been fixed by a number of >> somewhat-related changes in 3.7.13, because nobody reported issues >> anymore :) >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Aug 12 14:30:00 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Aug 2020 10:30:00 -0400 Subject: [tc] weekly summary Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Move towards dual office hours https://review.opendev.org/745201 - Clean up expired i18n SIG extra-ATCs https://review.opendev.org/745565 - Resolution to define distributed leadership for projects https://review.opendev.org/744995 - Move towards single office hour https://review.opendev.org/745200 - Add legacy repository validation https://review.opendev.org/737559 - Drop all exceptions for legacy validation https://review.opendev.org/745403 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 - Clean up expired i18n SIG extra-ATCs https://review.opendev.org/745565 - Add legacy repository validation https://review.opendev.org/737559 - Pierre Riteau as CloudKitty PTL for Victoria https://review.opendev.org/745653 - Resolution to define distributed leadership for projects https://review.opendev.org/744995 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Move towards dual office hours https://review.opendev.org/745201 - Move towards single office hour https://review.opendev.org/745200 ## Project Updates - Deprecate os_congress project https://review.opendev.org/742533 - Add Ceph iSCSI charm to OpenStack charms https://review.opendev.org/744480 - Add Keystone Kerberos charm to OpenStack charms https://review.opendev.org/743769 - Add python-dracclient to be owned by Hardware Vendor SIG https://review.opendev.org/745564 ## General Changes - Reverse sort series in selected goals https://review.opendev.org/744897 - Declare supported runtimes for Wallaby release https://review.opendev.org/743847 - Sort SIG names in repo owner list https://review.opendev.org/745563 - Drop neutron-vpnaas from legacy projects https://review.opendev.org/745401 ## Abandoned Changes - Migrate testing to ubuntu focal https://review.opendev.org/740851 # Email Threads - CloudKitty Status: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016171.html - OSC vs python-*clients: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016409.html - Proposed Wallaby Schedule: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016391.html - New Office Hour Plans: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016372.html # Other Reminders - Cycle-Trailing Release Deadline Aug 13 Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From dev.faz at gmail.com Wed Aug 12 15:03:40 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 12 Aug 2020 17:03:40 +0200 Subject: [largescale-sig] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <20200806141132.GO31915@sync> Message-ID: Hi, Am Mi., 12. Aug. 2020 um 16:30 Uhr schrieb Mohammed Naser < mnaser at vexxhost.com>: > On Thu, Aug 6, 2020 at 10:11 AM Arnaud Morin > wrote: > The controllers are pretty stable and if a controller dies, Kubernetes > will take care of restarting the pod somewhere else and everything > will reconnect and things will be happy again. > sounds really interesting. Do you have any docs how to use / do a poc of this setup? Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Wed Aug 12 15:21:31 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 12 Aug 2020 10:21:31 -0500 Subject: [all] Proposed Wallaby cycle schedule In-Reply-To: <2e56de68-c416-e3ea-f3da-caaf9399287d@gmx.com> References: <2e56de68-c416-e3ea-f3da-caaf9399287d@gmx.com> Message-ID: <0083db2a-0ef7-99fa-0c45-fd170f7d7902@gmx.com> > > The current thinking is it will likely take place in May (nothing is > set, just an educated guess, so please don't use that for any other > planning). So for the sake of figuring out the release schedule, we are > targeting a release date in early May. Hopefully this will then align > well with event plans. > > I have a proposed release schedule up for review here: > > https://review.opendev.org/#/c/744729/ > > For ease of viewing (until the job logs are garbage collected), you can > see the rendered schedule here: > > https://0e6b8aeca433e85b429b-46fd243db6dc394538bd0555f339eba5.ssl.cf1.rackcdn.com/744729/3/check/openstack-tox-docs/4f76901/docs/wallaby/schedule.html > > > There are always outside conflicts, but I think this has aligned mostly > well with major holidays. But please feel free to comment on the patch > if you see any major issues that we may have not considered. > One more update to this. Some concerns were raised around alignment with the planned Ubuntu release schedule. Plus some general sentiment for wanting to be closer to a 6 month schedule. As an alternative option, I have proposed a 26 week option: https://review.opendev.org/#/c/745911/ This would mean there would be a largish gap between when the X release starts and when we might hold the PTG for that development. That could be good or bad. Depending on the in-person event situation, it is also unknown if we would need to wait for a larger scheduled event, or if we would be able to hold a virtual event sooner. So lot's of unknowns. Getting community feedback on these options would be useful. If one schedule or the other seems better to you, please add comments to the patches. Here is a rendered schedule for the 26 week option: https://f7c086752d1ed6ae5f02-3fd01ef5e4a590ae96edf7e9bfcef60c.ssl.cf1.rackcdn.com/745911/2/check/openstack-tox-docs/5be238a/docs/wallaby/schedule.html Thanks! Sean From openstack at nemebean.com Wed Aug 12 15:50:21 2020 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 12 Aug 2020 10:50:21 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> Message-ID: <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> On 8/12/20 5:32 AM, Thierry Carrez wrote: > Sean Mooney wrote: >> On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: >>> I wonder if this does help though. It seems like a bug that a >>> nova-compute service would stop processing messages and still be seen >>> as up in the service status. Do we understand why that is happening? >>> If not, I'm unclear that a ping living at the oslo.messaging layer is >>> going to do a better job of exposing such an outage. The fact that >>> oslo.messaging is responding does not necessarily equate to >>> nova-compute functioning as expected. >>> >>> To be clear, this is not me nacking the ping feature. I just want to >>> make sure we understand what is going on here so we don't add another >>> unreliable healthchecking mechanism to the one we already have. >> [...] >> im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug >> that is motiviting the creation of this oslo ping >> feature but that feels premature if it is. i think it would be better >> try to adress this by the sender recreating the >> queue if the deliver fails and if that is not viable then protpyope >> thge fix in nova. if the self ping fixes this >> miss queue error then we could extract the cod into oslo. > > I think this is missing the point... This is not about working around a > specific bug, it's about adding a way to detect a certain class of > failure. It's more of an operational feature than a development bugfix. > > If I understood correctly, OVH is running that patch in production as a > way to detect certain problems they regularly run into, something our > existing monitor mechanisms fail to detect. That sounds like a > worthwhile addition? Okay, I don't think I was aware that this was already being used. If someone already finds it useful and it's opt-in then I'm not inclined to block it. My main concern was that we were adding a feature that didn't actually address the problem at hand. I _would_ feel better about it if someone could give an example of a type of failure this is detecting that is missed by other monitoring methods though. Both because having a concrete example of a use case for the feature is good, and because if it turns out that the problems this is detecting are things like the Nova bug Sean is talking about (which I don't think this would catch anyway, since the topic is missing and there's nothing to ping) then there may be other changes we can/should make to improve things. > > Alternatively, if we can monitor the exact same class of failures using > our existing systems (or by improving them rather than adding a new > door), that works too. > From thierry at openstack.org Wed Aug 12 16:14:58 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 12 Aug 2020 18:14:58 +0200 Subject: [largescale-sig] Next meeting: August 12, 16utc In-Reply-To: <6e7a4e43-08f4-3030-2eb0-9311f27d9647@openstack.org> References: <6e7a4e43-08f4-3030-2eb0-9311f27d9647@openstack.org> Message-ID: We just held the meeting, it was very short, as only mdelavergne and myself were on. None of the expected US-based recruits joined. We'll likely have to beat a larger drum for the next US-EU meeting in 4 weeks. Meeting logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-08-12-16.00.html TODOs: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation Next meetings: Aug 26, 8:00UTC; Sep 9, 16:00UTC (#openstack-meeting-3) -- Thierry Carrez (ttx) From mnaser at vexxhost.com Wed Aug 12 16:56:12 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Aug 2020 12:56:12 -0400 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez wrote: > > Thomas Goirand wrote: > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > >> Thanks, Pierre for helping with this. > >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) > >> but I am not sure if he got any response back. > > No response so far, but they may all be in company summer vacation. > > > The end of the very good maintenance of Cloudkitty matched the date when > > objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? > > > > This is very disappointing as I've been using it for some time already, > > and that I was satisfied by it (ie: it does the job...), and especially > > that latest releases are able to scale correctly. > > > > I very much would love if Pierre Riteau was successful in taking over. > > Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. > > Given the volunteers (Pierre, Rafael, Luis) I would support the TC using > its unholy powers to add extra core reviewers to cloudkitty. https://review.opendev.org/#/c/745653 is currently merging and fungi will be adding Pierre as a core. Thank you for helping. > If the current PTL comes back, I'm sure they will appreciate the help, > and can always fix/revert things before Victoria release. > > -- > Thierry Carrez (ttx) > -- Mohammed Naser VEXXHOST, Inc. From openstack at nemebean.com Wed Aug 12 17:02:34 2020 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 12 Aug 2020 12:02:34 -0500 Subject: [tc][oslo] Move etcd3gw to OpenStack? In-Reply-To: References: Message-ID: <0c77a128-acea-75f3-faa2-b3d79c3991aa@nemebean.com> I'm fine with this for all the reasons mentioned below. It's not a high volume project so it shouldn't be a big problem to bring it into Oslo. Plus it would give us an excuse to make dims an Oslo core again. ;-) On 8/12/20 6:45 AM, Radosław Piliszek wrote: > Hey, Folks! > > I see it has been kinda proposed already [1] so that's mostly why I am > asking about that now. > > From what I understand, etcd3gw is our best bet when trying to get > coordination with etcd3. > However, it has recently released a broken release [2] due to no testing > (not to mention gating with tooz). > I think it could benefit from OpenDev's existing tooling. > And since OpenStack is an important client of it and OpenStack > preferring this client, it might be wise to put it in that namespace > already. > > I guess the details would have to be discussed with dims (current owner) > himself but he seemed happy about it in [1]. > > I'm notifying oslo as well as this would probably live best finally > under oslo governance. > > Please let me know if any of the above is not true, so that I can amend > my knowledge. :-) > > [1] https://github.com/dims/etcd3-gateway/issues/29 > [2] https://bugs.launchpad.net/kolla-ansible/+bug/1891314 > > -yoctozepto > From rafaelweingartner at gmail.com Wed Aug 12 17:06:53 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 12 Aug 2020 14:06:53 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Awesome! Thank you guys for the help. We have few PRs open there that are ready (or close to be ready) to be merged. On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser wrote: > On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez > wrote: > > > > Thomas Goirand wrote: > > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > > >> Thanks, Pierre for helping with this. > > >> > > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < > justin.ferrieu at objectif-libre.com>) > > >> but I am not sure if he got any response back. > > > > No response so far, but they may all be in company summer vacation. > > > > > The end of the very good maintenance of Cloudkitty matched the date > when > > > objectif libre was sold to Linkbynet. Maybe the new owner don't care > enough? > > > > > > This is very disappointing as I've been using it for some time already, > > > and that I was satisfied by it (ie: it does the job...), and especially > > > that latest releases are able to scale correctly. > > > > > > I very much would love if Pierre Riteau was successful in taking over. > > > Good luck Pierre! I'll try to help whenever I can and if I'm not too > busy. > > > > Given the volunteers (Pierre, Rafael, Luis) I would support the TC using > > its unholy powers to add extra core reviewers to cloudkitty. > > https://review.opendev.org/#/c/745653 is currently merging and fungi will > be > adding Pierre as a core. > > Thank you for helping. > > > If the current PTL comes back, I'm sure they will appreciate the help, > > and can always fix/revert things before Victoria release. > > > > -- > > Thierry Carrez (ttx) > > > > > -- > Mohammed Naser > VEXXHOST, Inc. > > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Wed Aug 12 20:37:05 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 12 Aug 2020 22:37:05 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: I have now received core reviewer privileges. Thank you to TC members for trusting us with the CloudKitty project. I would like to kick things off by resuming IRC meetings. They're set to run every two weeks (on odd weeks) on Monday at 1400 UTC in #cloudkitty. Is this a convenient time slot for all potential contributors to the project? On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner wrote: > > Awesome! Thank you guys for the help. > We have few PRs open there that are ready (or close to be ready) to be merged. > > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser wrote: >> >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez wrote: >> > >> > Thomas Goirand wrote: >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >> > >> Thanks, Pierre for helping with this. >> > >> >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) >> > >> but I am not sure if he got any response back. >> > >> > No response so far, but they may all be in company summer vacation. >> > >> > > The end of the very good maintenance of Cloudkitty matched the date when >> > > objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? >> > > >> > > This is very disappointing as I've been using it for some time already, >> > > and that I was satisfied by it (ie: it does the job...), and especially >> > > that latest releases are able to scale correctly. >> > > >> > > I very much would love if Pierre Riteau was successful in taking over. >> > > Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. >> > >> > Given the volunteers (Pierre, Rafael, Luis) I would support the TC using >> > its unholy powers to add extra core reviewers to cloudkitty. >> >> https://review.opendev.org/#/c/745653 is currently merging and fungi will be >> adding Pierre as a core. >> >> Thank you for helping. >> >> > If the current PTL comes back, I'm sure they will appreciate the help, >> > and can always fix/revert things before Victoria release. >> > >> > -- >> > Thierry Carrez (ttx) >> > >> >> >> -- >> Mohammed Naser >> VEXXHOST, Inc. >> > > > -- > Rafael Weingärtner From rafaelweingartner at gmail.com Wed Aug 12 20:40:57 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 12 Aug 2020 17:40:57 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Sounds good to me. On Wed, Aug 12, 2020 at 5:37 PM Pierre Riteau wrote: > I have now received core reviewer privileges. Thank you to TC members > for trusting us with the CloudKitty project. > > I would like to kick things off by resuming IRC meetings. They're set > to run every two weeks (on odd weeks) on Monday at 1400 UTC in > #cloudkitty. Is this a convenient time slot for all potential > contributors to the project? > > On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner > wrote: > > > > Awesome! Thank you guys for the help. > > We have few PRs open there that are ready (or close to be ready) to be > merged. > > > > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser > wrote: > >> > >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez > wrote: > >> > > >> > Thomas Goirand wrote: > >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > >> > >> Thanks, Pierre for helping with this. > >> > >> > >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < > justin.ferrieu at objectif-libre.com>) > >> > >> but I am not sure if he got any response back. > >> > > >> > No response so far, but they may all be in company summer vacation. > >> > > >> > > The end of the very good maintenance of Cloudkitty matched the date > when > >> > > objectif libre was sold to Linkbynet. Maybe the new owner don't > care enough? > >> > > > >> > > This is very disappointing as I've been using it for some time > already, > >> > > and that I was satisfied by it (ie: it does the job...), and > especially > >> > > that latest releases are able to scale correctly. > >> > > > >> > > I very much would love if Pierre Riteau was successful in taking > over. > >> > > Good luck Pierre! I'll try to help whenever I can and if I'm not > too busy. > >> > > >> > Given the volunteers (Pierre, Rafael, Luis) I would support the TC > using > >> > its unholy powers to add extra core reviewers to cloudkitty. > >> > >> https://review.opendev.org/#/c/745653 is currently merging and fungi > will be > >> adding Pierre as a core. > >> > >> Thank you for helping. > >> > >> > If the current PTL comes back, I'm sure they will appreciate the help, > >> > and can always fix/revert things before Victoria release. > >> > > >> > -- > >> > Thierry Carrez (ttx) > >> > > >> > >> > >> -- > >> Mohammed Naser > >> VEXXHOST, Inc. > >> > > > > > > -- > > Rafael Weingärtner > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.ramirez at opencloud.es Wed Aug 12 21:38:30 2020 From: luis.ramirez at opencloud.es (Luis Ramirez) Date: Wed, 12 Aug 2020 23:38:30 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Sounds good to me. El El mié, 12 ago 2020 a las 22:41, Rafael Weingärtner < rafaelweingartner at gmail.com> escribió: > Sounds good to me. > > On Wed, Aug 12, 2020 at 5:37 PM Pierre Riteau wrote: > >> I have now received core reviewer privileges. Thank you to TC members >> >> >> for trusting us with the CloudKitty project. >> >> >> >> >> >> I would like to kick things off by resuming IRC meetings. They're set >> >> >> to run every two weeks (on odd weeks) on Monday at 1400 UTC in >> >> >> #cloudkitty. Is this a convenient time slot for all potential >> >> >> contributors to the project? >> >> >> >> >> >> On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner >> >> >> wrote: >> >> >> > >> >> >> > Awesome! Thank you guys for the help. >> >> >> > We have few PRs open there that are ready (or close to be ready) to be >> merged. >> >> >> > >> >> >> > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser >> wrote: >> >> >> >> >> >> >> >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez >> wrote: >> >> >> >> > >> >> >> >> > Thomas Goirand wrote: >> >> >> >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >> >> >> >> > >> Thanks, Pierre for helping with this. >> >> >> >> > >> >> >> >> >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < >> justin.ferrieu at objectif-libre.com>) >> >> >> >> > >> but I am not sure if he got any response back. >> >> >> >> > >> >> >> >> > No response so far, but they may all be in company summer vacation. >> >> >> >> > >> >> >> >> > > The end of the very good maintenance of Cloudkitty matched the >> date when >> >> >> >> > > objectif libre was sold to Linkbynet. Maybe the new owner don't >> care enough? >> >> >> >> > > >> >> >> >> > > This is very disappointing as I've been using it for some time >> already, >> >> >> >> > > and that I was satisfied by it (ie: it does the job...), and >> especially >> >> >> >> > > that latest releases are able to scale correctly. >> >> >> >> > > >> >> >> >> > > I very much would love if Pierre Riteau was successful in taking >> over. >> >> >> >> > > Good luck Pierre! I'll try to help whenever I can and if I'm not >> too busy. >> >> >> >> > >> >> >> >> > Given the volunteers (Pierre, Rafael, Luis) I would support the TC >> using >> >> >> >> > its unholy powers to add extra core reviewers to cloudkitty. >> >> >> >> >> >> >> >> https://review.opendev.org/#/c/745653 is currently merging and fungi >> will be >> >> >> >> adding Pierre as a core. >> >> >> >> >> >> >> >> Thank you for helping. >> >> >> >> >> >> >> >> > If the current PTL comes back, I'm sure they will appreciate the >> help, >> >> >> >> > and can always fix/revert things before Victoria release. >> >> >> >> > >> >> >> >> > -- >> >> >> >> > Thierry Carrez (ttx) >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Mohammed Naser >> >> >> >> VEXXHOST, Inc. >> >> >> >> >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Rafael Weingärtner >> >> >> > > -- > Rafael Weingärtner > > > -- Br, Luis Rmz Blockchain, DevOps & Open Source Cloud Solutions Architect ---------------------------------------- Founder & CEO OpenCloud.es luis.ramirez at opencloud.es Skype ID: d.overload Hangouts: luis.ramirez at opencloud.es +34 911 950 123 / +39 392 1289553 / +49 152 26917722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed Aug 12 23:22:38 2020 From: melwittt at gmail.com (melanie witt) Date: Wed, 12 Aug 2020 16:22:38 -0700 Subject: [gate][keystone][nova][neutron] *-grenade-multinode jobs failing with UnicodeDecodeError in keystone In-Reply-To: <18572a52-d105-9219-6b19-5fe23f18e3e0@gmail.com> References: <45926788-6dcf-8825-5bfd-b6353b5facf0@gmail.com> <67a115ba-f80a-ebe5-8689-922e3bbb9a40@gmail.com> <18572a52-d105-9219-6b19-5fe23f18e3e0@gmail.com> Message-ID: On 8/12/20 00:49, melanie witt wrote: >>> I've opened a bug [3] about and I'm trying out the following keystone >>> patch to fix it: >>> >>> https://review.opendev.org/745752 >>> >>> Reviews appreciated. > > I tested ^ with a DNM patch to nova and nova-grenade-multinode passes > with it. The fix has merged and it is now safe to recheck your patches. Thank you all for the code reviews. Cheers, -melanie >>> [1] >>> https://github.com/msgpack/msgpack-python/blob/v1.0.0/README.md#major-breaking-changes-in-msgpack-10 >>> >>> [2] https://review.opendev.org/#/c/745437/2/upper-constraints.txt at 373 >>> [3] https://launchpad.net/bugs/1891244 From jayadityagupta11 at gmail.com Thu Aug 13 08:03:01 2020 From: jayadityagupta11 at gmail.com (jayaditya gupta) Date: Thu, 13 Aug 2020 10:03:01 +0200 Subject: [openstackclient] Implementing nova migration cmds in OSC Message-ID: Hello , i am trying to implement some nova migrations commands to openstackclient. Commands 1. migration-list : list all migrations ever happened 2. server-migration-list : Get the migrations list of specified server 3. server-migration-show : show currently going on migration of specified server. 4. live migration abort feature 5.live-migration-force-complete Please have a look at this patch : https://review.opendev.org/#/c/742210/ and share your insight, what should be the correct way to implement it ? should it be a root command or part of the openstack server migrate command? Best Regards Jayaditya Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Aug 13 08:24:26 2020 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 13 Aug 2020 10:24:26 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> Message-ID: Ben Nemec wrote: > On 8/12/20 5:32 AM, Thierry Carrez wrote: >> Sean Mooney wrote: >>> On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: >>>> I wonder if this does help though. It seems like a bug that a >>>> nova-compute service would stop processing messages and still be >>>> seen as up in the service status. Do we understand why that is >>>> happening? If not, I'm unclear that a ping living at the >>>> oslo.messaging layer is going to do a better job of exposing such an >>>> outage. The fact that oslo.messaging is responding does not >>>> necessarily equate to nova-compute functioning as expected. >>>> >>>> To be clear, this is not me nacking the ping feature. I just want to >>>> make sure we understand what is going on here so we don't add >>>> another unreliable healthchecking mechanism to the one we already have. >>> [...] >>> im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug >>> that is motiviting the creation of this oslo ping >>> feature but that feels premature if it is. i think it would be better >>> try to adress this by the sender recreating the >>> queue if the deliver fails and if that is not viable then protpyope >>> thge fix in nova. if the self ping fixes this >>> miss queue error then we could extract the cod into oslo. >> >> I think this is missing the point... This is not about working around >> a specific bug, it's about adding a way to detect a certain class of >> failure. It's more of an operational feature than a development bugfix. >> >> If I understood correctly, OVH is running that patch in production as >> a way to detect certain problems they regularly run into, something >> our existing monitor mechanisms fail to detect. That sounds like a >> worthwhile addition? > > Okay, I don't think I was aware that this was already being used. If > someone already finds it useful and it's opt-in then I'm not inclined to > block it. My main concern was that we were adding a feature that didn't > actually address the problem at hand. > > I _would_ feel better about it if someone could give an example of a > type of failure this is detecting that is missed by other monitoring > methods though. Both because having a concrete example of a use case for > the feature is good, and because if it turns out that the problems this > is detecting are things like the Nova bug Sean is talking about (which I > don't think this would catch anyway, since the topic is missing and > there's nothing to ping) then there may be other changes we can/should > make to improve things. Right. Let's wait for Arnaud to come back from vacation and confirm that (1) that patch is not a shot in the dark: it allows them to expose a class of issues in production (2) they fail to expose that same class of issues using other existing mechanisms, including those just suggested in this thread I just wanted to avoid early rejection of this health check ability on the grounds that the situation it exposes should just not happen. Or that, if enabled and heavily used, it would have a performance impact. -- Thierry Carrez (ttx) From pierre at stackhpc.com Thu Aug 13 11:35:42 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 13 Aug 2020 13:35:42 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Thank you both. I've merged a few patches to fix CI and finalise the Ussuri release (for example release notes were missing). I gave core reviewer privileges to Rafael and Luis. Let's try to merge patches with two +2 votes from now on. On Wed, 12 Aug 2020 at 23:38, Luis Ramirez wrote: > > Sounds good to me. > > El El mié, 12 ago 2020 a las 22:41, Rafael Weingärtner escribió: >> >> Sounds good to me. >> >> On Wed, Aug 12, 2020 at 5:37 PM Pierre Riteau wrote: >>> >>> I have now received core reviewer privileges. Thank you to TC members >>> >>> >>> for trusting us with the CloudKitty project. >>> >>> >>> >>> >>> >>> I would like to kick things off by resuming IRC meetings. They're set >>> >>> >>> to run every two weeks (on odd weeks) on Monday at 1400 UTC in >>> >>> >>> #cloudkitty. Is this a convenient time slot for all potential >>> >>> >>> contributors to the project? >>> >>> >>> >>> >>> >>> On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner >>> >>> >>> wrote: >>> >>> >>> > >>> >>> >>> > Awesome! Thank you guys for the help. >>> >>> >>> > We have few PRs open there that are ready (or close to be ready) to be merged. >>> >>> >>> > >>> >>> >>> > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser wrote: >>> >>> >>> >> >>> >>> >>> >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez wrote: >>> >>> >>> >> > >>> >>> >>> >> > Thomas Goirand wrote: >>> >>> >>> >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >>> >>> >>> >> > >> Thanks, Pierre for helping with this. >>> >>> >>> >> > >> >>> >>> >>> >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) >>> >>> >>> >> > >> but I am not sure if he got any response back. >>> >>> >>> >> > >>> >>> >>> >> > No response so far, but they may all be in company summer vacation. >>> >>> >>> >> > >>> >>> >>> >> > > The end of the very good maintenance of Cloudkitty matched the date when >>> >>> >>> >> > > objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? >>> >>> >>> >> > > >>> >>> >>> >> > > This is very disappointing as I've been using it for some time already, >>> >>> >>> >> > > and that I was satisfied by it (ie: it does the job...), and especially >>> >>> >>> >> > > that latest releases are able to scale correctly. >>> >>> >>> >> > > >>> >>> >>> >> > > I very much would love if Pierre Riteau was successful in taking over. >>> >>> >>> >> > > Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. >>> >>> >>> >> > >>> >>> >>> >> > Given the volunteers (Pierre, Rafael, Luis) I would support the TC using >>> >>> >>> >> > its unholy powers to add extra core reviewers to cloudkitty. >>> >>> >>> >> >>> >>> >>> >> https://review.opendev.org/#/c/745653 is currently merging and fungi will be >>> >>> >>> >> adding Pierre as a core. >>> >>> >>> >> >>> >>> >>> >> Thank you for helping. >>> >>> >>> >> >>> >>> >>> >> > If the current PTL comes back, I'm sure they will appreciate the help, >>> >>> >>> >> > and can always fix/revert things before Victoria release. >>> >>> >>> >> > >>> >>> >>> >> > -- >>> >>> >>> >> > Thierry Carrez (ttx) >>> >>> >>> >> > >>> >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> -- >>> >>> >>> >> Mohammed Naser >>> >>> >>> >> VEXXHOST, Inc. >>> >>> >>> >> >>> >>> >>> > >>> >>> >>> > >>> >>> >>> > -- >>> >>> >>> > Rafael Weingärtner >>> >>> >> >> >> -- >> Rafael Weingärtner >> >> > -- > Br, > Luis Rmz > Blockchain, DevOps & Open Source Cloud Solutions Architect > ---------------------------------------- > Founder & CEO > OpenCloud.es > luis.ramirez at opencloud.es > Skype ID: d.overload > Hangouts: luis.ramirez at opencloud.es > +34 911 950 123 / +39 392 1289553 / +49 152 26917722 From rafaelweingartner at gmail.com Thu Aug 13 11:44:20 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 13 Aug 2020 08:44:20 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Awesome, thanks! I will try to dedicate a few hours every week to review CloudKitty patches. On Thu, Aug 13, 2020 at 8:36 AM Pierre Riteau wrote: > Thank you both. > > I've merged a few patches to fix CI and finalise the Ussuri release > (for example release notes were missing). > I gave core reviewer privileges to Rafael and Luis. Let's try to merge > patches with two +2 votes from now on. > > On Wed, 12 Aug 2020 at 23:38, Luis Ramirez > wrote: > > > > Sounds good to me. > > > > El El mié, 12 ago 2020 a las 22:41, Rafael Weingärtner < > rafaelweingartner at gmail.com> escribió: > >> > >> Sounds good to me. > >> > >> On Wed, Aug 12, 2020 at 5:37 PM Pierre Riteau > wrote: > >>> > >>> I have now received core reviewer privileges. Thank you to TC members > >>> > >>> > >>> for trusting us with the CloudKitty project. > >>> > >>> > >>> > >>> > >>> > >>> I would like to kick things off by resuming IRC meetings. They're set > >>> > >>> > >>> to run every two weeks (on odd weeks) on Monday at 1400 UTC in > >>> > >>> > >>> #cloudkitty. Is this a convenient time slot for all potential > >>> > >>> > >>> contributors to the project? > >>> > >>> > >>> > >>> > >>> > >>> On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner > >>> > >>> > >>> wrote: > >>> > >>> > >>> > > >>> > >>> > >>> > Awesome! Thank you guys for the help. > >>> > >>> > >>> > We have few PRs open there that are ready (or close to be ready) to > be merged. > >>> > >>> > >>> > > >>> > >>> > >>> > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser > wrote: > >>> > >>> > >>> >> > >>> > >>> > >>> >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez < > thierry at openstack.org> wrote: > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > Thomas Goirand wrote: > >>> > >>> > >>> >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > >>> > >>> > >>> >> > >> Thanks, Pierre for helping with this. > >>> > >>> > >>> >> > >> > >>> > >>> > >>> >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < > justin.ferrieu at objectif-libre.com>) > >>> > >>> > >>> >> > >> but I am not sure if he got any response back. > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > No response so far, but they may all be in company summer > vacation. > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > > The end of the very good maintenance of Cloudkitty matched the > date when > >>> > >>> > >>> >> > > objectif libre was sold to Linkbynet. Maybe the new owner don't > care enough? > >>> > >>> > >>> >> > > > >>> > >>> > >>> >> > > This is very disappointing as I've been using it for some time > already, > >>> > >>> > >>> >> > > and that I was satisfied by it (ie: it does the job...), and > especially > >>> > >>> > >>> >> > > that latest releases are able to scale correctly. > >>> > >>> > >>> >> > > > >>> > >>> > >>> >> > > I very much would love if Pierre Riteau was successful in > taking over. > >>> > >>> > >>> >> > > Good luck Pierre! I'll try to help whenever I can and if I'm > not too busy. > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > Given the volunteers (Pierre, Rafael, Luis) I would support the > TC using > >>> > >>> > >>> >> > its unholy powers to add extra core reviewers to cloudkitty. > >>> > >>> > >>> >> > >>> > >>> > >>> >> https://review.opendev.org/#/c/745653 is currently merging and > fungi will be > >>> > >>> > >>> >> adding Pierre as a core. > >>> > >>> > >>> >> > >>> > >>> > >>> >> Thank you for helping. > >>> > >>> > >>> >> > >>> > >>> > >>> >> > If the current PTL comes back, I'm sure they will appreciate the > help, > >>> > >>> > >>> >> > and can always fix/revert things before Victoria release. > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > -- > >>> > >>> > >>> >> > Thierry Carrez (ttx) > >>> > >>> > >>> >> > > >>> > >>> > >>> >> > >>> > >>> > >>> >> > >>> > >>> > >>> >> -- > >>> > >>> > >>> >> Mohammed Naser > >>> > >>> > >>> >> VEXXHOST, Inc. > >>> > >>> > >>> >> > >>> > >>> > >>> > > >>> > >>> > >>> > > >>> > >>> > >>> > -- > >>> > >>> > >>> > Rafael Weingärtner > >>> > >>> > >> > >> > >> -- > >> Rafael Weingärtner > >> > >> > > -- > > Br, > > Luis Rmz > > Blockchain, DevOps & Open Source Cloud Solutions Architect > > ---------------------------------------- > > Founder & CEO > > OpenCloud.es > > luis.ramirez at opencloud.es > > Skype ID: d.overload > > Hangouts: luis.ramirez at opencloud.es > > +34 911 950 123 / +39 392 1289553 / +49 152 26917722 > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 13 12:14:06 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Aug 2020 13:14:06 +0100 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> Message-ID: <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> On Thu, 2020-08-13 at 10:24 +0200, Thierry Carrez wrote: > Ben Nemec wrote: > > On 8/12/20 5:32 AM, Thierry Carrez wrote: > > > Sean Mooney wrote: > > > > On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: > > > > > I wonder if this does help though. It seems like a bug that a > > > > > nova-compute service would stop processing messages and still be > > > > > seen as up in the service status. Do we understand why that is > > > > > happening? If not, I'm unclear that a ping living at the > > > > > oslo.messaging layer is going to do a better job of exposing such an > > > > > outage. The fact that oslo.messaging is responding does not > > > > > necessarily equate to nova-compute functioning as expected. > > > > > > > > > > To be clear, this is not me nacking the ping feature. I just want to > > > > > make sure we understand what is going on here so we don't add > > > > > another unreliable healthchecking mechanism to the one we already have. > > > > > > > > [...] > > > > im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug > > > > that is motiviting the creation of this oslo ping > > > > feature but that feels premature if it is. i think it would be better > > > > try to adress this by the sender recreating the > > > > queue if the deliver fails and if that is not viable then protpyope > > > > thge fix in nova. if the self ping fixes this > > > > miss queue error then we could extract the cod into oslo. > > > > > > I think this is missing the point... This is not about working around > > > a specific bug, it's about adding a way to detect a certain class of > > > failure. It's more of an operational feature than a development bugfix. > > > > > > If I understood correctly, OVH is running that patch in production as > > > a way to detect certain problems they regularly run into, something > > > our existing monitor mechanisms fail to detect. That sounds like a > > > worthwhile addition? > > > > Okay, I don't think I was aware that this was already being used. If > > someone already finds it useful and it's opt-in then I'm not inclined to > > block it. My main concern was that we were adding a feature that didn't > > actually address the problem at hand. > > > > I _would_ feel better about it if someone could give an example of a > > type of failure this is detecting that is missed by other monitoring > > methods though. Both because having a concrete example of a use case for > > the feature is good, and because if it turns out that the problems this > > is detecting are things like the Nova bug Sean is talking about (which I > > don't think this would catch anyway, since the topic is missing and > > there's nothing to ping) then there may be other changes we can/should > > make to improve things. > > Right. Let's wait for Arnaud to come back from vacation and confirm that > > (1) that patch is not a shot in the dark: it allows them to expose a > class of issues in production > > (2) they fail to expose that same class of issues using other existing > mechanisms, including those just suggested in this thread > > I just wanted to avoid early rejection of this health check ability on > the grounds that the situation it exposes should just not happen. Or > that, if enabled and heavily used, it would have a performance impact. I think the inital push back from nova is we already have ping rpc function https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/baserpc.py#L55-L76 so if a geneirc metion called ping is added it will break nova. the reset of the push back is related to not haveing a concrete usecase, including concern over perfroamce consideration and external services potenailly acessing the rpc bus which is coniserd an internal api. e.g. we woudl not want an external monitoring solution connecting to the rpc bus and invoking arbitary RPC calls, ping is well pretty safe but form a design point of view while litening to notification is fine we dont want anything outside of the openstack services actully sending message on the rpc bus. so if this does actully detect somethign we can otherwise detect and the use cases involves using it within the openstack services not form an external source then i think that is fine but we proably need to use another name (alive? status?) or otherewise modify nova so that there is no conflict. > From dev.faz at gmail.com Thu Aug 13 13:13:45 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 13 Aug 2020 15:13:45 +0200 Subject: [nova][neutron][oslo][ops] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Hi, just did some short tests today in our test-environment (without durable queues and without replication): * started a rally task to generate some load * kill-9-ed rabbitmq on one node * rally task immediately stopped and the cloud (mostly) stopped working after some debugging i found (again) exchanges which had bindings to queues, but these bindings didnt forward any msgs. Wrote a small script to detect these broken bindings and will now check if this is "reproducible" then I will try "durable queues" and "durable queues with replication" to see if this helps. Even if I would expect rabbitmq should be able to handle this without these "hidden broken bindings" This just FYI. Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Thu Aug 13 13:15:41 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 13 Aug 2020 15:15:41 +0200 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: Message-ID: Hi, we would really appreciate your comments on this. Especially the OSC team and all the project teams that are facing issues migrating their clients. Let us know, Belmiro On Mon, Aug 10, 2020 at 10:13 AM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > during the last PTG the TC discussed the problem of supporting different > clients (OpenStack Client - OSC vs python-*clients) [1]. > Currently, we don't have feature parity between the OSC and the > python-*clients. > > Different OpenStack projects invest in different clients. > This can be a huge problem for users/ops. Depending on the projects > deployed in their infrastructures, they need to use different clients for > different tasks. > It's confusing because of the partial implementation in the OSC. > > There was also the proposal to enforce new functionality only in the SDK > (and optionally the OSC) and not the project’s specific clients to stop > increasing the disparity between the two. > > We would like to understand first the problems and missing pieces that > projects are facing to move into OSC and help to overcome them. > Let us know. > > Belmiro, > on behalf of the TC > > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015418.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekuvaja at redhat.com Thu Aug 13 13:36:34 2020 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Thu, 13 Aug 2020 14:36:34 +0100 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: Message-ID: On Thu, Aug 13, 2020 at 2:19 PM Belmiro Moreira < moreira.belmiro.email.lists at gmail.com> wrote: > Hi, > we would really appreciate your comments on this. > Especially the OSC team and all the project teams that are facing issues > migrating their clients. > > Let us know, > Belmiro > > In Glance perspective we already stated that we're more than happy to try endorsing osc again once it has stabilized the Images API facing code and maintained feature parity for a few cycles. Just stopping developing python-glanceclient will only result in no up-to-date stable client for Images API developed under OpenStack Governance. I really don't think forcing to fork python-glanceclient to keep development going outside of OpenStack Governance will be the better solution here. - jokke On Mon, Aug 10, 2020 at 10:13 AM Belmiro Moreira < > moreira.belmiro.email.lists at gmail.com> wrote: > >> Hi, >> during the last PTG the TC discussed the problem of supporting different >> clients (OpenStack Client - OSC vs python-*clients) [1]. >> Currently, we don't have feature parity between the OSC and the >> python-*clients. >> >> Different OpenStack projects invest in different clients. >> This can be a huge problem for users/ops. Depending on the projects >> deployed in their infrastructures, they need to use different clients for >> different tasks. >> It's confusing because of the partial implementation in the OSC. >> >> There was also the proposal to enforce new functionality only in the SDK >> (and optionally the OSC) and not the project’s specific clients to stop >> increasing the disparity between the two. >> >> We would like to understand first the problems and missing pieces that >> projects are facing to move into OSC and help to overcome them. >> Let us know. >> >> Belmiro, >> on behalf of the TC >> >> [1] >> http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015418.html >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwm2012 at gmail.com Thu Aug 13 13:38:20 2020 From: pwm2012 at gmail.com (pwm) Date: Thu, 13 Aug 2020 21:38:20 +0800 Subject: DNS server instead of /etc/hosts file on Infra Server In-Reply-To: <7db29753-4710-a979-fe71-67a829fa55c3@rd.bbc.co.uk> References: <7db29753-4710-a979-fe71-67a829fa55c3@rd.bbc.co.uk> Message-ID: Great will check it out. Thanks, Jonathan. On Wed, Aug 12, 2020 at 9:39 PM Jonathan Rosser < jonathan.rosser at rd.bbc.co.uk> wrote: > Openstack-Ansible already supports optionally using the unbound dns server > instead of managing > /etc/hosts. > > Join #openstack-ansible on IRC if you need any help. > > Regards, > Jonathan. > On 12/08/2020 14:03, pwm wrote: > > Hi, > Plan to use PowerDNS server instead of the /etc/hosts file for resolving. > Has anyone done this before? > The PowerDNS support MySQL DB backend and a frontend GUI PowerDNS Admin > which allows centralized easy maintenance. > > Thanks > > On Sun, Aug 9, 2020 at 11:54 PM pwm wrote: > >> Hi, >> Anyone interested in replacing the /etc/hosts file entry with a DNS >> server on the openstack-ansible deployment? >> >> Thank you >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.ramirez at opencloud.es Thu Aug 13 13:59:49 2020 From: luis.ramirez at opencloud.es (Luis Ramirez) Date: Thu, 13 Aug 2020 15:59:49 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Great! I'll try to do the same. Br, Luis Rmz Blockchain, DevOps & Open Source Cloud Solutions Architect ---------------------------------------- Founder & CEO OpenCloud.es luis.ramirez at opencloud.es Skype ID: d.overload Hangouts: luis.ramirez at opencloud.es [image: ] +34 911 950 123 / [image: ]+39 392 1289553 / [image: ]+49 152 26917722 / Česká republika: +420 774 274 882 ----------------------------------------------------- El jue., 13 ago. 2020 a las 13:44, Rafael Weingärtner (< rafaelweingartner at gmail.com>) escribió: > Awesome, thanks! > I will try to dedicate a few hours every week to review CloudKitty patches. > > On Thu, Aug 13, 2020 at 8:36 AM Pierre Riteau wrote: > >> Thank you both. >> >> I've merged a few patches to fix CI and finalise the Ussuri release >> (for example release notes were missing). >> I gave core reviewer privileges to Rafael and Luis. Let's try to merge >> patches with two +2 votes from now on. >> >> On Wed, 12 Aug 2020 at 23:38, Luis Ramirez >> wrote: >> > >> > Sounds good to me. >> > >> > El El mié, 12 ago 2020 a las 22:41, Rafael Weingärtner < >> rafaelweingartner at gmail.com> escribió: >> >> >> >> Sounds good to me. >> >> >> >> On Wed, Aug 12, 2020 at 5:37 PM Pierre Riteau >> wrote: >> >>> >> >>> I have now received core reviewer privileges. Thank you to TC members >> >>> >> >>> >> >>> for trusting us with the CloudKitty project. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> I would like to kick things off by resuming IRC meetings. They're set >> >>> >> >>> >> >>> to run every two weeks (on odd weeks) on Monday at 1400 UTC in >> >>> >> >>> >> >>> #cloudkitty. Is this a convenient time slot for all potential >> >>> >> >>> >> >>> contributors to the project? >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Wed, 12 Aug 2020 at 19:08, Rafael Weingärtner >> >>> >> >>> >> >>> wrote: >> >>> >> >>> >> >>> > >> >>> >> >>> >> >>> > Awesome! Thank you guys for the help. >> >>> >> >>> >> >>> > We have few PRs open there that are ready (or close to be ready) to >> be merged. >> >>> >> >>> >> >>> > >> >>> >> >>> >> >>> > On Wed, Aug 12, 2020 at 1:59 PM Mohammed Naser >> wrote: >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez < >> thierry at openstack.org> wrote: >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> > Thomas Goirand wrote: >> >>> >> >>> >> >>> >> > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >> >>> >> >>> >> >>> >> > >> Thanks, Pierre for helping with this. >> >>> >> >>> >> >>> >> > >> >> >>> >> >>> >> >>> >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < >> justin.ferrieu at objectif-libre.com>) >> >>> >> >>> >> >>> >> > >> but I am not sure if he got any response back. >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> > No response so far, but they may all be in company summer >> vacation. >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> > > The end of the very good maintenance of Cloudkitty matched the >> date when >> >>> >> >>> >> >>> >> > > objectif libre was sold to Linkbynet. Maybe the new owner >> don't care enough? >> >>> >> >>> >> >>> >> > > >> >>> >> >>> >> >>> >> > > This is very disappointing as I've been using it for some time >> already, >> >>> >> >>> >> >>> >> > > and that I was satisfied by it (ie: it does the job...), and >> especially >> >>> >> >>> >> >>> >> > > that latest releases are able to scale correctly. >> >>> >> >>> >> >>> >> > > >> >>> >> >>> >> >>> >> > > I very much would love if Pierre Riteau was successful in >> taking over. >> >>> >> >>> >> >>> >> > > Good luck Pierre! I'll try to help whenever I can and if I'm >> not too busy. >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> > Given the volunteers (Pierre, Rafael, Luis) I would support the >> TC using >> >>> >> >>> >> >>> >> > its unholy powers to add extra core reviewers to cloudkitty. >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> https://review.opendev.org/#/c/745653 is currently merging and >> fungi will be >> >>> >> >>> >> >>> >> adding Pierre as a core. >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> Thank you for helping. >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> > If the current PTL comes back, I'm sure they will appreciate the >> help, >> >>> >> >>> >> >>> >> > and can always fix/revert things before Victoria release. >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> > -- >> >>> >> >>> >> >>> >> > Thierry Carrez (ttx) >> >>> >> >>> >> >>> >> > >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> >> -- >> >>> >> >>> >> >>> >> Mohammed Naser >> >>> >> >>> >> >>> >> VEXXHOST, Inc. >> >>> >> >>> >> >>> >> >> >>> >> >>> >> >>> > >> >>> >> >>> >> >>> > >> >>> >> >>> >> >>> > -- >> >>> >> >>> >> >>> > Rafael Weingärtner >> >>> >> >>> >> >> >> >> >> >> -- >> >> Rafael Weingärtner >> >> >> >> >> > -- >> > Br, >> > Luis Rmz >> > Blockchain, DevOps & Open Source Cloud Solutions Architect >> > ---------------------------------------- >> > Founder & CEO >> > OpenCloud.es >> > luis.ramirez at opencloud.es >> > Skype ID: d.overload >> > Hangouts: luis.ramirez at opencloud.es >> > +34 911 950 123 / +39 392 1289553 / +49 152 26917722 >> > > > -- > Rafael Weingärtner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Thu Aug 13 14:07:40 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Thu, 13 Aug 2020 09:07:40 -0500 Subject: Community Voting for The Virtual Summit Sessions is Open! Message-ID: <02EC11B6-6D57-4DFA-94B6-F920E90A4FF6@openstack.org> Community voting for the virtual Open Infrastructure Summit sessions is open! You can VOTE HERE , but what does that mean? Now that the Call for Presentations has closed, all submissions are available for community vote and input. After community voting closes, the volunteer Programming Committee members will receive the results to review to help them determine the final selections for Summit schedule. While community votes are meant to help inform the decision, Programming Committee members are expected to exercise judgment in their area of expertise and help ensure diversity of sessions and speakers. View full details of the session selection process . In order to vote, you need an OSF community membership. If you do not have an account, please create one by going to openstack.org/join . If you need to reset your password, you can do that here . Hurry, voting closes Monday, August 17 at 11:59pm Pacific Time. Don’t forget to Register for the Summit for free! Visit https://www.openstack.org/summit/2020/ for all other Summit-related information. Interested in sponsoring? Visit this page . If you have any questions, please email summit at openstack.org . Cheers, Ashlee Ashlee Ferguson Community & Events Coordinator OpenStack Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasowang at redhat.com Thu Aug 13 04:24:50 2020 From: jasowang at redhat.com (Jason Wang) Date: Thu, 13 Aug 2020 12:24:50 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200810074631.GA29059@joy-OptiPlex-7040> References: <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> Message-ID: On 2020/8/10 下午3:46, Yan Zhao wrote: >> driver is it handled by? > It looks that the devlink is for network device specific, and in > devlink.h, it says > include/uapi/linux/devlink.h - Network physical device Netlink > interface, Actually not, I think there used to have some discussion last year and the conclusion is to remove this comment. It supports IB and probably vDPA in the future. > I feel like it's not very appropriate for a GPU driver to use > this interface. Is that right? I think not though most of the users are switch or ethernet devices. It doesn't prevent you from inventing new abstractions. Note that devlink is based on netlink, netlink has been widely used by various subsystems other than networking. Thanks > > Thanks > Yan > > From alifshit at redhat.com Thu Aug 13 14:30:49 2020 From: alifshit at redhat.com (Artom Lifshitz) Date: Thu, 13 Aug 2020 10:30:49 -0400 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> Message-ID: On Mon, Aug 10, 2020 at 4:40 AM Luigi Toscano wrote: > > On Monday, 10 August 2020 10:26:24 CEST Radosław Piliszek wrote: > > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < > > > > moreira.belmiro.email.lists at gmail.com> wrote: > > > Hi, > > > during the last PTG the TC discussed the problem of supporting different > > > clients (OpenStack Client - OSC vs python-*clients) [1]. > > > Currently, we don't have feature parity between the OSC and the > > > python-*clients. > > > > Is it true of any client? I guess some are just OSC plugins 100%. > > Do we know which clients have this disparity? > > Personally, I encountered this with Glance the most and Cinder to some > > extent (but I believe over the course of action Cinder got all features I > > wanted from it in the OSC). > > As far as I know there is still a huge problem with microversion handling > which impacts some cinder features. It has been discussed in the past and > still present. Yeah, my understanding is that osc will never "properly" support microversions. Openstacksdk is the future in that sense, and my understanding is that the osc team is "porting" osc to use the sdk. Given these two thing, when we (Nova) talked about this with the osc folks, we decided that rather than catch up osc to python-novaclient, we'd rather focus our efforts on the sdk. I've been slowly doing that [1], starting with the earlier Nova microversions. The eventual long term goal is for the Nova team to *only* support the sdk, and drop python-novaclient entirely, but that's a long time away. [1] https://review.opendev.org/#/q/status:open+project:openstack/openstacksdk+branch:master+topic:story/2007929 > > > -- > Luigi > > > From moguimar at redhat.com Thu Aug 13 15:06:31 2020 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Thu, 13 Aug 2020 17:06:31 +0200 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core Message-ID: Hello everybody, It is my pleasure to propose Lance Bragstad (lbragstad) as a new member of the oslo.core core team. Lance has been a big contributor to the project and is known as a walking version of the Keystone documentation, which happens to be one of the biggest consumers of oslo.cache. Obviously we think he'd make a good addition to the core team. If there are no objections, I'll make that happen in a week. Thanks. -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Aug 13 15:08:31 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 13 Aug 2020 10:08:31 -0500 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core In-Reply-To: References: Message-ID: <82be76d9-0898-63a7-d339-48e9f6db540c@gmx.com> On 8/13/20 10:06 AM, Moises Guimaraes de Medeiros wrote: > Hello everybody, > > It is my pleasure to propose Lance Bragstad (lbragstad) as a new > member of the oslo.core core team. > > Lance has been a big contributor to the project and is known as a > walking version of the Keystone documentation, which happens to be one > of the biggest consumers of oslo.cache. > > Obviously we think he'd make a good addition to the core team. If > there are no objections, I'll make that happen in a week. > +1! -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 13 15:18:45 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Aug 2020 16:18:45 +0100 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> Message-ID: <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> On Thu, 2020-08-13 at 10:30 -0400, Artom Lifshitz wrote: > On Mon, Aug 10, 2020 at 4:40 AM Luigi Toscano wrote: > > > > On Monday, 10 August 2020 10:26:24 CEST Radosław Piliszek wrote: > > > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < > > > > > > moreira.belmiro.email.lists at gmail.com> wrote: > > > > Hi, > > > > during the last PTG the TC discussed the problem of supporting different > > > > clients (OpenStack Client - OSC vs python-*clients) [1]. > > > > Currently, we don't have feature parity between the OSC and the > > > > python-*clients. > > > > > > Is it true of any client? I guess some are just OSC plugins 100%. > > > Do we know which clients have this disparity? > > > Personally, I encountered this with Glance the most and Cinder to some > > > extent (but I believe over the course of action Cinder got all features I > > > wanted from it in the OSC). > > > > As far as I know there is still a huge problem with microversion handling > > which impacts some cinder features. It has been discussed in the past and > > still present. > > Yeah, my understanding is that osc will never "properly" support > microversions. it does already properly support micorversion the issue is not everyone agrees on what properly means. the behavior of the project clients was considered broken by many. it has been poirpose that we explcity allow a way to opt in to the auto negociation via a new "auto" sentaial value and i have also suggested that we should tag each comman with the minium microversion that parmater or command requires and decault to that minium based on teh arges you passed. both of those imporvement dont break the philosipy of providing stable cli behavior across cloud and would imporve the ux. defaulting to the minium microversion needed for the arguments passed would solve most of the ux issues and adding an auto sentical would resolve the rest while still keeping the correct microversion behvior it already has. the glance and cinder gaps are not really related to microverions by the way. its just that no one has done the work and cinder an glance have not require contiuptors to update osc as part of adding new features. nova has not required that either but there were some who worked on nova that cared enough about osc to mention it in code review or submit patches themsevles. the glance team does not really have the resouces to do that and the osc team does not have the resouce to maintain clis for all teams. so over tiem as service poject added new feature the gaps have increase since there were not people tyring to keep it in sync. > Openstacksdk is the future in that sense, and my > understanding is that the osc team is "porting" osc to use the sdk. > Given these two thing, when we (Nova) talked about this with the osc > folks, we decided that rather than catch up osc to python-novaclient, > we'd rather focus our efforts on the sdk. well that is not entirly a good caraterisation. we want to catch up osc too but the suggest was to support eveything in osc then it would be easier to add osc support since it just has to call the sdk functions. we did not say we dont want to close the gaps in osc. > I've been slowly doing that > [1], starting with the earlier Nova microversions. The eventual long > term goal is for the Nova team to *only* support the sdk, and drop > python-novaclient entirely, but that's a long time away. > > [1] https://review.opendev.org/#/q/status:open+project:openstack/openstacksdk+branch:master+topic:story/2007929 > > > > > > > -- > > Luigi > > > > > > > > From openstack at nemebean.com Thu Aug 13 15:28:12 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 13 Aug 2020 10:28:12 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> Message-ID: On 8/13/20 7:14 AM, Sean Mooney wrote: > On Thu, 2020-08-13 at 10:24 +0200, Thierry Carrez wrote: >> Ben Nemec wrote: >>> On 8/12/20 5:32 AM, Thierry Carrez wrote: >>>> Sean Mooney wrote: >>>>> On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: >>>>>> I wonder if this does help though. It seems like a bug that a >>>>>> nova-compute service would stop processing messages and still be >>>>>> seen as up in the service status. Do we understand why that is >>>>>> happening? If not, I'm unclear that a ping living at the >>>>>> oslo.messaging layer is going to do a better job of exposing such an >>>>>> outage. The fact that oslo.messaging is responding does not >>>>>> necessarily equate to nova-compute functioning as expected. >>>>>> >>>>>> To be clear, this is not me nacking the ping feature. I just want to >>>>>> make sure we understand what is going on here so we don't add >>>>>> another unreliable healthchecking mechanism to the one we already have. >>>>> >>>>> [...] >>>>> im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug >>>>> that is motiviting the creation of this oslo ping >>>>> feature but that feels premature if it is. i think it would be better >>>>> try to adress this by the sender recreating the >>>>> queue if the deliver fails and if that is not viable then protpyope >>>>> thge fix in nova. if the self ping fixes this >>>>> miss queue error then we could extract the cod into oslo. >>>> >>>> I think this is missing the point... This is not about working around >>>> a specific bug, it's about adding a way to detect a certain class of >>>> failure. It's more of an operational feature than a development bugfix. >>>> >>>> If I understood correctly, OVH is running that patch in production as >>>> a way to detect certain problems they regularly run into, something >>>> our existing monitor mechanisms fail to detect. That sounds like a >>>> worthwhile addition? >>> >>> Okay, I don't think I was aware that this was already being used. If >>> someone already finds it useful and it's opt-in then I'm not inclined to >>> block it. My main concern was that we were adding a feature that didn't >>> actually address the problem at hand. >>> >>> I _would_ feel better about it if someone could give an example of a >>> type of failure this is detecting that is missed by other monitoring >>> methods though. Both because having a concrete example of a use case for >>> the feature is good, and because if it turns out that the problems this >>> is detecting are things like the Nova bug Sean is talking about (which I >>> don't think this would catch anyway, since the topic is missing and >>> there's nothing to ping) then there may be other changes we can/should >>> make to improve things. >> >> Right. Let's wait for Arnaud to come back from vacation and confirm that >> >> (1) that patch is not a shot in the dark: it allows them to expose a >> class of issues in production >> >> (2) they fail to expose that same class of issues using other existing >> mechanisms, including those just suggested in this thread >> >> I just wanted to avoid early rejection of this health check ability on >> the grounds that the situation it exposes should just not happen. Or >> that, if enabled and heavily used, it would have a performance impact. > I think the inital push back from nova is we already have ping rpc function > https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/baserpc.py#L55-L76 > so if a geneirc metion called ping is added it will break nova. It occurred to me after I commented on the review that we have tempest running on oslo.messaging changes and it passed on the patch for this. I suppose it's possible that it broke some error handling in Nova that just isn't tested, but maybe the new ping could function as a cross-project replacement for the Nova ping? Anyway, it's still be to deduplicate the name, but I felt kind of dumb about having asked if it was tested when the test results were right in front of me. ;-) > > the reset of the push back is related to not haveing a concrete usecase, including concern over > perfroamce consideration and external services potenailly acessing the rpc bus which is coniserd an internal > api. e.g. we woudl not want an external monitoring solution connecting to the rpc bus and invoking arbitary > RPC calls, ping is well pretty safe but form a design point of view while litening to notification is fine > we dont want anything outside of the openstack services actully sending message on the rpc bus. I'm not concerned about the performance impact here. It's an optional feature, so anyone using it is choosing to take that hit. Having external stuff on the RPC bus is more of a gray area, but it's not like we can stop operators from doing that. I think it's probably better to provide a well-defined endpoint for them to talk to rather than have everyone implement their own slightly different RPC ping mechanism. The docs for this feature should be very explicit that this is the only thing external code should be calling. > > so if this does actully detect somethign we can otherwise detect and the use cases involves using it within > the openstack services not form an external source then i think that is fine but we proably need to use another > name (alive? status?) or otherewise modify nova so that there is no conflict. >> > If I understand your analysis of the bug correctly, this would have caught that type of outage after all since the failure was asymmetric. The compute node was still able to send its status updates to Nova, but wasn't receiving any messages. A ping would have detected that situation. From openstack at nemebean.com Thu Aug 13 15:28:43 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 13 Aug 2020 10:28:43 -0500 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core In-Reply-To: References: Message-ID: <305f4be5-950d-97a6-63d0-fe59d56bfe8d@nemebean.com> +1! On 8/13/20 10:06 AM, Moises Guimaraes de Medeiros wrote: > Hello everybody, > > It is my pleasure to propose Lance Bragstad (lbragstad) as a new member > of the oslo.core core team. > > Lance has been a big contributor to the project and is known as a > walking version of the Keystone documentation, which happens to be one > of the biggest consumers of oslo.cache. > > Obviously we think he'd make a good addition to the core team. If there > are no objections, I'll make that happen in a week. > > Thanks. > > -- > > Moisés Guimarães > > Software Engineer > > Red Hat > > > From allison at openstack.org Thu Aug 13 15:28:55 2020 From: allison at openstack.org (Allison Price) Date: Thu, 13 Aug 2020 10:28:55 -0500 Subject: Running OpenStack? Take the 2020 User Survey now! Message-ID: <1E70817D-D322-4B3E-8940-762DBAA7708A@openstack.org> Hi everyone, There is only one week left before we are closing the 2020 OpenStack User Survey [1]! If you are running OpenStack, please take a few minutes to log your deployment—all information will remain anonymous unless you indicate otherwise. If you have completed a User Survey before, all you have to do is update your information and answer a few new questions. Anonymous feedback will be passed along to the upstream project teams, and anonymized data will be available in the analytics dashboard [2]. The deadline to add your deployment to this round of analysis is Thursday, August 20. Let me know if you have any questions or issues completing. Thanks! Allison [1] https://www.openstack.org/user-survey/survey-2020 [2] https://www.openstack.org/analytics -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Thu Aug 13 15:42:30 2020 From: abishop at redhat.com (Alan Bishop) Date: Thu, 13 Aug 2020 08:42:30 -0700 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> Message-ID: On Thu, Aug 13, 2020 at 8:27 AM Sean Mooney wrote: > On Thu, 2020-08-13 at 10:30 -0400, Artom Lifshitz wrote: > > On Mon, Aug 10, 2020 at 4:40 AM Luigi Toscano > wrote: > > > > > > On Monday, 10 August 2020 10:26:24 CEST Radosław Piliszek wrote: > > > > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < > > > > > > > > moreira.belmiro.email.lists at gmail.com> wrote: > > > > > Hi, > > > > > during the last PTG the TC discussed the problem of supporting > different > > > > > clients (OpenStack Client - OSC vs python-*clients) [1]. > > > > > Currently, we don't have feature parity between the OSC and the > > > > > python-*clients. > > > > > > > > Is it true of any client? I guess some are just OSC plugins 100%. > > > > Do we know which clients have this disparity? > > > > Personally, I encountered this with Glance the most and Cinder to > some > > > > extent (but I believe over the course of action Cinder got all > features I > > > > wanted from it in the OSC). > > > > > > As far as I know there is still a huge problem with microversion > handling > > > which impacts some cinder features. It has been discussed in the past > and > > > still present. > > > > Yeah, my understanding is that osc will never "properly" support > > microversions. > it does already properly support micorversion the issue is not everyone > agrees > on what properly means. the behavior of the project clients was considered > broken > by many. it has been poirpose that we explcity allow a way to opt in to > the auto negociation > via a new "auto" sentaial value and i have also suggested that we should > tag each comman with the minium > microversion that parmater or command requires and decault to that minium > based on teh arges you passed. > > both of those imporvement dont break the philosipy of providing stable cli > behavior across cloud and would > imporve the ux. defaulting to the minium microversion needed for the > arguments passed would solve most of the ux > issues and adding an auto sentical would resolve the rest while still > keeping the correct microversion behvior it > already has. > > the glance and cinder gaps are not really related to microverions by the > way. > its just that no one has done the work and cinder an glance have not > require contiuptors to update > Updates to osc from cinder's side are pretty much stalled due to lack of support for microversions. A patch for that was rejected and we've had trouble getting an update on a viable path forward. See comment in https://review.opendev.org/590807. Alan > osc as part of adding new features. nova has not required that either but > there were some who worked on nova > that cared enough about osc to mention it in code review or submit patches > themsevles. the glance team does > not really have the resouces to do that and the osc team does not have the > resouce to maintain clis for all teams. > > so over tiem as service poject added new feature the gaps have increase > since there were not people tyring to keep it in > sync. > > > Openstacksdk is the future in that sense, and my > > understanding is that the osc team is "porting" osc to use the sdk. > > Given these two thing, when we (Nova) talked about this with the osc > > folks, we decided that rather than catch up osc to python-novaclient, > > we'd rather focus our efforts on the sdk. > well that is not entirly a good caraterisation. we want to catch up osc too > but the suggest was to support eveything in osc then it would be easier to > add osc support > since it just has to call the sdk functions. we did not say we dont want > to close the gaps in osc. > > > I've been slowly doing that > > [1], starting with the earlier Nova microversions. The eventual long > > term goal is for the Nova team to *only* support the sdk, and drop > > python-novaclient entirely, but that's a long time away. > > > > [1] > https://review.opendev.org/#/q/status:open+project:openstack/openstacksdk+branch:master+topic:story/2007929 > > > > > > > > > > > -- > > > Luigi > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 13 16:07:18 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Aug 2020 17:07:18 +0100 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> Message-ID: <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> On Thu, 2020-08-13 at 10:28 -0500, Ben Nemec wrote: > > On 8/13/20 7:14 AM, Sean Mooney wrote: > > On Thu, 2020-08-13 at 10:24 +0200, Thierry Carrez wrote: > > > Ben Nemec wrote: > > > > On 8/12/20 5:32 AM, Thierry Carrez wrote: > > > > > Sean Mooney wrote: > > > > > > On Tue, 2020-08-11 at 15:20 -0500, Ben Nemec wrote: > > > > > > > I wonder if this does help though. It seems like a bug that a > > > > > > > nova-compute service would stop processing messages and still be > > > > > > > seen as up in the service status. Do we understand why that is > > > > > > > happening? If not, I'm unclear that a ping living at the > > > > > > > oslo.messaging layer is going to do a better job of exposing such an > > > > > > > outage. The fact that oslo.messaging is responding does not > > > > > > > necessarily equate to nova-compute functioning as expected. > > > > > > > > > > > > > > To be clear, this is not me nacking the ping feature. I just want to > > > > > > > make sure we understand what is going on here so we don't add > > > > > > > another unreliable healthchecking mechanism to the one we already have. > > > > > > > > > > > > [...] > > > > > > im not sure https://bugs.launchpad.net/nova/+bug/1854992 is the bug > > > > > > that is motiviting the creation of this oslo ping > > > > > > feature but that feels premature if it is. i think it would be better > > > > > > try to adress this by the sender recreating the > > > > > > queue if the deliver fails and if that is not viable then protpyope > > > > > > thge fix in nova. if the self ping fixes this > > > > > > miss queue error then we could extract the cod into oslo. > > > > > > > > > > I think this is missing the point... This is not about working around > > > > > a specific bug, it's about adding a way to detect a certain class of > > > > > failure. It's more of an operational feature than a development bugfix. > > > > > > > > > > If I understood correctly, OVH is running that patch in production as > > > > > a way to detect certain problems they regularly run into, something > > > > > our existing monitor mechanisms fail to detect. That sounds like a > > > > > worthwhile addition? > > > > > > > > Okay, I don't think I was aware that this was already being used. If > > > > someone already finds it useful and it's opt-in then I'm not inclined to > > > > block it. My main concern was that we were adding a feature that didn't > > > > actually address the problem at hand. > > > > > > > > I _would_ feel better about it if someone could give an example of a > > > > type of failure this is detecting that is missed by other monitoring > > > > methods though. Both because having a concrete example of a use case for > > > > the feature is good, and because if it turns out that the problems this > > > > is detecting are things like the Nova bug Sean is talking about (which I > > > > don't think this would catch anyway, since the topic is missing and > > > > there's nothing to ping) then there may be other changes we can/should > > > > make to improve things. > > > > > > Right. Let's wait for Arnaud to come back from vacation and confirm that > > > > > > (1) that patch is not a shot in the dark: it allows them to expose a > > > class of issues in production > > > > > > (2) they fail to expose that same class of issues using other existing > > > mechanisms, including those just suggested in this thread > > > > > > I just wanted to avoid early rejection of this health check ability on > > > the grounds that the situation it exposes should just not happen. Or > > > that, if enabled and heavily used, it would have a performance impact. > > > > I think the inital push back from nova is we already have ping rpc function > > https://github.com/openstack/nova/blob/c6218428e9b29a2c52808ec7d27b4b21aadc0299/nova/baserpc.py#L55-L76 > > so if a geneirc metion called ping is added it will break nova. > > It occurred to me after I commented on the review that we have tempest > running on oslo.messaging changes and it passed on the patch for this. I > suppose it's possible that it broke some error handling in Nova that > just isn't tested, but maybe the new ping could function as a > cross-project replacement for the Nova ping? proably yes its only used in one place https://opendev.org/openstack/nova/src/branch/master/nova/conductor/api.py#L66-L72 which is only used here in the nova service base class https://github.com/openstack/nova/blob/0b613729ff975f69587a17cc7818c09f7683ebf2/nova/service.py#L126 os worst case i think its just going to cause the service to start before the conductor is ready however they have to tolerate the conductor restarting ectra anyway so i dont think it will break anything too badly. i dont see why we coudl not use a generic version instead. > > Anyway, it's still be to deduplicate the name, but I felt kind of dumb > about having asked if it was tested when the test results were right in > front of me. ;-) > > > > > the reset of the push back is related to not haveing a concrete usecase, including concern over > > perfroamce consideration and external services potenailly acessing the rpc bus which is coniserd an internal > > api. e.g. we woudl not want an external monitoring solution connecting to the rpc bus and invoking arbitary > > RPC calls, ping is well pretty safe but form a design point of view while litening to notification is fine > > we dont want anything outside of the openstack services actully sending message on the rpc bus. > > I'm not concerned about the performance impact here. It's an optional > feature, so anyone using it is choosing to take that hit. > > Having external stuff on the RPC bus is more of a gray area, but it's > not like we can stop operators from doing that. well upstream certenly we cant really stop them. downstream on the other hadn without going through the certification process to have your product certifed to work with our downstream distrobution directlly invoking RPC endpoint would invlaidate your support. so form a dwonstream perpective we do have ways to prevent that via docs and makeing it clear that it not supported. we can technically do that upstream but cant really enforce it, its opensouce software after all if you break it then you get to keep the broken pices. > I think it's probably > better to provide a well-defined endpoint for them to talk to rather > than have everyone implement their own slightly different RPC ping > mechanism. The docs for this feature should be very explicit that this > is the only thing external code should be calling. ya i think that is a good approch. i would still prefer if people used say middelware to add a service ping admin api endpoint instead of driectly calling the rpc endpoint to avoid exposing rabbitmq but that is out of scope of this discussion. > > > > > so if this does actully detect somethign we can otherwise detect and the use cases involves using it within > > the openstack services not form an external source then i think that is fine but we proably need to use another > > name (alive? status?) or otherewise modify nova so that there is no conflict. > > > > > If I understand your analysis of the bug correctly, this would have > caught that type of outage after all since the failure was asymmetric. am im not sure it might yes looking at https://review.opendev.org/#/c/735385/6 its not clear to me how the endpoint is invoked. is it doing a topic send or a direct send? to detech the failure you would need to invoke a ping on the compute service and that ping would have to been encured on the to nova topic exchante with a routing key of compute. if the compute topic queue was broken either because it was nolonger bound to the correct topic or due to some other rabbitmq error then you woudl either get a message undeilverbale error of some kind with the mandaroy flag or likely a timeout without the mandaroty flag. so if the ping would be routed usign a topic too compute. then yes it would find this. although we can also detech this ourselves and fix it using the mandatory flag i think by just recreating the queue wehn it extis but we get an undeliverable message, at least i think we can rabbit is not my main are of expertiese so it woudl be nice is someone that know more about it can weigh in on that. > The compute node was still able to send its status updates to Nova, but > wasn't receiving any messages. A ping would have detected that situation. > From kennelson11 at gmail.com Thu Aug 13 16:19:27 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 13 Aug 2020 09:19:27 -0700 Subject: [PTL][SIG][TC] vPTG October 2020 Team Signup Message-ID: Greetings! As you hopefully already know, our next PTG will be virtual again, and held from Monday October 26th to Friday October 30th. We will have the same schedule set up available as last time with three windows of time spread across the day to cover all timezones with breaks in between. *To signup your team, you must complete **BOTH** the survey[1] AND reserve time in the ethercalc[2] by September 11th at 7:00 UTC.* We ask that the PTL/SIG Chair/Team lead sign up for time to have their discussions in with 4 rules/guidelines. 1. Cross project discussions (like SIGs or support project teams) should be scheduled towards the start of the week so that any discussions that might shape those of other teams happen first. 2. No team should sign up for more than 4 hours per UTC day to help keep participants actively engaged. 3. No team should sign up for more than 16 hours across all time slots to avoid burning out our contributors and to enable participation in multiple teams discussions. Again, you need to fill out BOTH the ethercalc AND the survey to complete your team's sign up. If you have any issues with signing up your team, due to conflict or otherwise, please let me know! While we are trying to empower you to make your own decisions as to when you meet and for how long (after all, you know your needs and teams timezones better than we do), we are here to help! Once your team is signed up, please register! And remind your team to register! Registration is free, but since it will be how we contact you with passwords, event details, etc. it is still important! Continue to check back for updates at openstack.org/ptg. -the Kendalls (diablo_rojo & wendallkaters) [1] Team Survey: https://openstackfoundation.formstack.com/forms/june2020_virtual_ptg_survey [2] Ethercalc Signup: https://ethercalc.openstack.org/126u8ek25noy [3] PTG Registration: https://october2020ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Aug 13 16:21:31 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 13 Aug 2020 11:21:31 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> Message-ID: On 8/13/20 11:07 AM, Sean Mooney wrote: >> I think it's probably >> better to provide a well-defined endpoint for them to talk to rather >> than have everyone implement their own slightly different RPC ping >> mechanism. The docs for this feature should be very explicit that this >> is the only thing external code should be calling. > ya i think that is a good approch. > i would still prefer if people used say middelware to add a service ping admin api endpoint > instead of driectly calling the rpc endpoint to avoid exposing rabbitmq but that is out of scope of this discussion. Completely agree. In the long run I would like to see this replaced with better integrated healthchecking in OpenStack, but we've been talking about that for years and have made minimal progress. > >> >>> >>> so if this does actully detect somethign we can otherwise detect and the use cases involves using it within >>> the openstack services not form an external source then i think that is fine but we proably need to use another >>> name (alive? status?) or otherewise modify nova so that there is no conflict. >>>> >> >> If I understand your analysis of the bug correctly, this would have >> caught that type of outage after all since the failure was asymmetric. > am im not sure > it might yes looking at https://review.opendev.org/#/c/735385/6 > its not clear to me how the endpoint is invoked. is it doing a topic send or a direct send? > to detech the failure you would need to invoke a ping on the compute service and that ping would > have to been encured on the to nova topic exchante with a routing key of compute. > > if the compute topic queue was broken either because it was nolonger bound to the correct topic or due to some other > rabbitmq error then you woudl either get a message undeilverbale error of some kind with the mandaroy flag or likely a > timeout without the mandaroty flag. so if the ping would be routed usign a topic too compute. > then yes it would find this. > > although we can also detech this ourselves and fix it using the mandatory flag i think by just recreating the queue wehn > it extis but we get an undeliverable message, at least i think we can rabbit is not my main are of expertiese so it > woudl be nice is someone that know more about it can weigh in on that. I pinged Ken this morning to take a look at that. He should be able to tell us whether it's a good idea or crazy talk. :-) From ekuvaja at redhat.com Thu Aug 13 16:27:16 2020 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Thu, 13 Aug 2020 17:27:16 +0100 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> Message-ID: On Thu, Aug 13, 2020 at 4:46 PM Alan Bishop wrote: > > > On Thu, Aug 13, 2020 at 8:27 AM Sean Mooney wrote: > >> On Thu, 2020-08-13 at 10:30 -0400, Artom Lifshitz wrote: >> > On Mon, Aug 10, 2020 at 4:40 AM Luigi Toscano >> wrote: >> > > >> > > On Monday, 10 August 2020 10:26:24 CEST Radosław Piliszek wrote: >> > > > On Mon, Aug 10, 2020 at 10:19 AM Belmiro Moreira < >> > > > >> > > > moreira.belmiro.email.lists at gmail.com> wrote: >> > > > > Hi, >> > > > > during the last PTG the TC discussed the problem of supporting >> different >> > > > > clients (OpenStack Client - OSC vs python-*clients) [1]. >> > > > > Currently, we don't have feature parity between the OSC and the >> > > > > python-*clients. >> > > > >> > > > Is it true of any client? I guess some are just OSC plugins 100%. >> > > > Do we know which clients have this disparity? >> > > > Personally, I encountered this with Glance the most and Cinder to >> some >> > > > extent (but I believe over the course of action Cinder got all >> features I >> > > > wanted from it in the OSC). >> > > >> > > As far as I know there is still a huge problem with microversion >> handling >> > > which impacts some cinder features. It has been discussed in the past >> and >> > > still present. >> > >> > Yeah, my understanding is that osc will never "properly" support >> > microversions. >> it does already properly support micorversion the issue is not everyone >> agrees >> on what properly means. the behavior of the project clients was >> considered broken >> by many. it has been poirpose that we explcity allow a way to opt in to >> the auto negociation >> via a new "auto" sentaial value and i have also suggested that we should >> tag each comman with the minium >> microversion that parmater or command requires and decault to that minium >> based on teh arges you passed. >> >> both of those imporvement dont break the philosipy of providing stable >> cli behavior across cloud and would >> imporve the ux. defaulting to the minium microversion needed for the >> arguments passed would solve most of the ux >> issues and adding an auto sentical would resolve the rest while still >> keeping the correct microversion behvior it >> already has. >> >> the glance and cinder gaps are not really related to microverions by the >> way. >> its just that no one has done the work and cinder an glance have not >> require contiuptors to update >> > > Updates to osc from cinder's side are pretty much stalled due to lack of > support for microversions. A patch for that was rejected and we've had > trouble getting an update on a viable path forward. See comment in > https://review.opendev.org/590807. > > Alan > > >> osc as part of adding new features. nova has not required that either but >> there were some who worked on nova >> that cared enough about osc to mention it in code review or submit >> patches themsevles. the glance team does >> not really have the resouces to do that and the osc team does not have >> the resouce to maintain clis for all teams. >> >> so over tiem as service poject added new feature the gaps have increase >> since there were not people tyring to keep it in >> sync. >> >> > Openstacksdk is the future in that sense, and my >> > understanding is that the osc team is "porting" osc to use the sdk. >> > Given these two thing, when we (Nova) talked about this with the osc >> > folks, we decided that rather than catch up osc to python-novaclient, >> > we'd rather focus our efforts on the sdk. >> well that is not entirly a good caraterisation. we want to catch up osc >> too >> but the suggest was to support eveything in osc then it would be easier >> to add osc support >> since it just has to call the sdk functions. we did not say we dont want >> to close the gaps in osc. >> >> > I've been slowly doing that >> > [1], starting with the earlier Nova microversions. The eventual long >> > term goal is for the Nova team to *only* support the sdk, and drop >> > python-novaclient entirely, but that's a long time away. >> > >> > [1] >> https://review.opendev.org/#/q/status:open+project:openstack/openstacksdk+branch:master+topic:story/2007929 >> > >> > > >> > > >> > > -- >> > > Luigi >> > > >> > > >> > > >> > >> > >> > So if I understand the whole picture correctly the situation has actually nothing to do with directly working OSC in favor of python-*client provided CLIs but actually moving everything to OSSDK so it can be supported and used by OSC to be used as default client for everything? As that seems to be a consensus that it's not enough to get the OSC to do the right thing if the client lib under the hood is still python-*client and specially if microversions. My question at this point is, do we (as a community) have enough bodies dedicated to OSSDK _and_ OSC to make this sustainable? I'm being sincere here as I have not been part of the development of either of those projects. But if my assumption above is correct, I think we should talk about these things with their real names rather than trying to mask this being just OSC vs python-*client CLI thing. - jokke -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Aug 13 16:31:34 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 13 Aug 2020 11:31:34 -0500 Subject: [PTL][SIG][TC] vPTG October 2020 Team Signup In-Reply-To: References: Message-ID: On 8/13/20 11:19 AM, Kendall Nelson wrote: > [2] Ethercalc Signup: https://ethercalc.openstack.org/126u8ek25noy This is taking me to the ethercalc from last time. I assume that wasn't intentional? From fungi at yuggoth.org Thu Aug 13 16:41:31 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Aug 2020 16:41:31 +0000 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> Message-ID: <20200813164131.bdmhankpd2qxycux@yuggoth.org> On 2020-08-13 17:27:16 +0100 (+0100), Erno Kuvaja wrote: [...] > My question at this point is, do we (as a community) have enough > bodies dedicated to OSSDK _and_ OSC to make this sustainable? I'm > being sincere here as I have not been part of the development of > either of those projects. But if my assumption above is correct, I > think we should talk about these things with their real names > rather than trying to mask this being just OSC vs python-*client > CLI thing. Hopefully this doesn't come across as a glib response, but if people didn't have to maintain multiple CLIs and SDKs then maybe they would have enough time to collaborate on a universal CLI/SDK pair instead. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From kennelson11 at gmail.com Thu Aug 13 16:43:03 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 13 Aug 2020 09:43:03 -0700 Subject: [PTL][SIG][TC] vPTG October 2020 Team Signup In-Reply-To: References: Message-ID: SIGH. Yes. Here is the new ethercalc: https://ethercalc.openstack.org/7xp2pcbh1ncb Sorry for the confusion! -Kendall (diablo_rojo) On Thu, Aug 13, 2020 at 9:31 AM Ben Nemec wrote: > > > On 8/13/20 11:19 AM, Kendall Nelson wrote: > > [2] Ethercalc Signup: https://ethercalc.openstack.org/126u8ek25noy > > This is taking me to the ethercalc from last time. I assume that wasn't > intentional? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alifshit at redhat.com Thu Aug 13 17:03:05 2020 From: alifshit at redhat.com (Artom Lifshitz) Date: Thu, 13 Aug 2020 13:03:05 -0400 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: <20200813164131.bdmhankpd2qxycux@yuggoth.org> References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> <20200813164131.bdmhankpd2qxycux@yuggoth.org> Message-ID: On Thu, Aug 13, 2020 at 12:45 PM Jeremy Stanley wrote: > > On 2020-08-13 17:27:16 +0100 (+0100), Erno Kuvaja wrote: > [...] > > My question at this point is, do we (as a community) have enough > > bodies dedicated to OSSDK _and_ OSC to make this sustainable? I'm > > being sincere here as I have not been part of the development of > > either of those projects. But if my assumption above is correct, I > > think we should talk about these things with their real names > > rather than trying to mask this being just OSC vs python-*client > > CLI thing. > > Hopefully this doesn't come across as a glib response, but if people > didn't have to maintain multiple CLIs and SDKs then maybe they would > have enough time to collaborate on a universal CLI/SDK pair instead. Agreed - but historically that's not what happened, so the question now is how to improve the situation. My understanding is that osc is effectively dead, except as a shell around the sdk, since that's where the future lies. So in my mind, efforts should be concentrated on two fronts: 1. Continue converting osc to use the sdk 2. Catch up the SDK This is a bit of a chicken and egg problem, because any gaps in sdk mean you can't convert osc to use those missing bits, but ideally any patches to osc that aren't sdk conversions would get blocked (though I have obviously absolutely no say in the matter, this is just wishful thinking). The project teams can work on 2 for their project (so like I've been slowly doing for Nova), the osc team can work on 1. > -- > Jeremy Stanley From kennelson11 at gmail.com Thu Aug 13 17:09:13 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 13 Aug 2020 10:09:13 -0700 Subject: [PTL][SIG][TC] vPTG October 2020 Team Signup In-Reply-To: References: Message-ID: Sigh. I guess I should have known better than to send this out without having a cup of tea first. The survey link in the original email is also from the last PTG. Please use this survey link: https://openstackfoundation.formstack.com/forms/oct2020_vptg_survey -Kendall (diablo_rojo) On Thu, Aug 13, 2020 at 9:43 AM Kendall Nelson wrote: > SIGH. Yes. Here is the new ethercalc: > > https://ethercalc.openstack.org/7xp2pcbh1ncb > > Sorry for the confusion! > > -Kendall (diablo_rojo) > > On Thu, Aug 13, 2020 at 9:31 AM Ben Nemec wrote: > >> >> >> On 8/13/20 11:19 AM, Kendall Nelson wrote: >> > [2] Ethercalc Signup: https://ethercalc.openstack.org/126u8ek25noy >> >> This is taking me to the ethercalc from last time. I assume that wasn't >> intentional? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Aug 13 17:09:42 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 13 Aug 2020 12:09:42 -0500 Subject: [oslo] vPTG scheduling Message-ID: <59c4975f-67c4-8351-caef-b4937e641741@nemebean.com> Continuing my policy of EAFP scheduling, I've signed us up for two hours starting at our regular meeting time. This has worked well for our past couple of virtual events so I didn't see any reason to change it. If that time doesn't work for you, please let me know ASAP so we can make alternate arrangements. Thanks. -Ben From fungi at yuggoth.org Thu Aug 13 17:20:19 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Aug 2020 17:20:19 +0000 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> <20200813164131.bdmhankpd2qxycux@yuggoth.org> Message-ID: <20200813172018.pw3mo6viekvzb7wx@yuggoth.org> On 2020-08-13 13:03:05 -0400 (-0400), Artom Lifshitz wrote: [...] > This is a bit of a chicken and egg problem, because any gaps in > sdk mean you can't convert osc to use those missing bits, but > ideally any patches to osc that aren't sdk conversions would get > blocked (though I have obviously absolutely no say in the matter, > this is just wishful thinking). [...] I think you do have a say. At the very least, this is why it's being discussed on the mailing list, but also as a contributor you get to vote on TC members to represent your interests in these sorts of decisions, and for that matter the team's leadership has been very willing to give interested folks more direct decision making ability as evidenced by the large core review group for the SDK repo. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From kgiusti at gmail.com Thu Aug 13 18:55:54 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Thu, 13 Aug 2020 14:55:54 -0400 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core In-Reply-To: References: Message-ID: +1 for Lance! On Thu, Aug 13, 2020 at 11:17 AM Moises Guimaraes de Medeiros < moguimar at redhat.com> wrote: > Hello everybody, > > It is my pleasure to propose Lance Bragstad (lbragstad) as a new member > of the oslo.core core team. > > Lance has been a big contributor to the project and is known as a walking > version of the Keystone documentation, which happens to be one of the > biggest consumers of oslo.cache. > > Obviously we think he'd make a good addition to the core team. If there > are no objections, I'll make that happen in a week. > > Thanks. > > -- > > Moisés Guimarães > > Software Engineer > > Red Hat > > > -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cohuck at redhat.com Thu Aug 13 15:33:47 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Thu, 13 Aug 2020 17:33:47 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200807135942.5d56a202.cohuck@redhat.com> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> <20200807135942.5d56a202.cohuck@redhat.com> Message-ID: <20200813173347.239801fa.cohuck@redhat.com> On Fri, 7 Aug 2020 13:59:42 +0200 Cornelia Huck wrote: > On Wed, 05 Aug 2020 12:35:01 +0100 > Sean Mooney wrote: > > > On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote: > > > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: > > (...) > > > > > software_version: device driver's version. > > > > in .[.bugfix] scheme, where there is no > > > > compatibility across major versions, minor versions have > > > > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > > > > bugfix version number indicates some degree of internal > > > > improvement that is not visible to the user in terms of > > > > features or compatibility, > > > > > > > > vendor specific attributes: each vendor may define different attributes > > > > device id : device id of a physical devices or mdev's parent pci device. > > > > it could be equal to pci id for pci devices > > > > aggregator: used together with mdev_type. e.g. aggregator=2 together > > > > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > > > > graphics device. > > > > remote_url: for a local NVMe VF, it may be configured with a remote > > > > url of a remote storage and all data is stored in the > > > > remote side specified by the remote url. > > > > ... > > just a minor not that i find ^ much more simmple to understand then > > the current proposal with self and compatiable. > > if i have well defiend attibute that i can parse and understand that allow > > me to calulate the what is and is not compatible that is likely going to > > more useful as you wont have to keep maintianing a list of other compatible > > devices every time a new sku is released. > > > > in anycase thank for actully shareing ^ as it make it simpler to reson about what > > you have previously proposed. > > So, what would be the most helpful format? A 'software_version' field > that follows the conventions outlined above, and other (possibly > optional) fields that have to match? Just to get a different perspective, I've been trying to come up with what would be useful for a very different kind of device, namely vfio-ccw. (Adding Eric to cc: for that.) software_version makes sense for everybody, so it should be a standard attribute. For the vfio-ccw type, we have only one vendor driver (vfio-ccw_IO). Given a subchannel A, we want to make sure that subchannel B has a reasonable chance of being compatible. I guess that means: - same subchannel type (I/O) - same chpid type (e.g. all FICON; I assume there are no 'mixed' setups -- Eric?) - same number of chpids? Maybe we can live without that and just inject some machine checks, I don't know. Same chpid numbers is something we cannot guarantee, especially if we want to migrate cross-CEC (to another machine.) Other possibly interesting information is not available at the subchannel level (vfio-ccw is a subchannel driver.) So, looking at a concrete subchannel on one of my machines, it would look something like the following: software_version=1.0.0 type=vfio-ccw <-- would be vfio-pci on the example above subchannel_type=0 chpid_type=0x1a chpid_mask=0xf0 <-- not sure if needed/wanted Does that make sense? From farman at linux.ibm.com Thu Aug 13 19:02:53 2020 From: farman at linux.ibm.com (Eric Farman) Date: Thu, 13 Aug 2020 15:02:53 -0400 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200813173347.239801fa.cohuck@redhat.com> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> <20200807135942.5d56a202.cohuck@redhat.com> <20200813173347.239801fa.cohuck@redhat.com> Message-ID: <315669b0-5c75-d359-a912-62ebab496abf@linux.ibm.com> On 8/13/20 11:33 AM, Cornelia Huck wrote: > On Fri, 7 Aug 2020 13:59:42 +0200 > Cornelia Huck wrote: > >> On Wed, 05 Aug 2020 12:35:01 +0100 >> Sean Mooney wrote: >> >>> On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote: >>>> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: >> >> (...) >> >>>>> software_version: device driver's version. >>>>> in .[.bugfix] scheme, where there is no >>>>> compatibility across major versions, minor versions have >>>>> forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and >>>>> bugfix version number indicates some degree of internal >>>>> improvement that is not visible to the user in terms of >>>>> features or compatibility, >>>>> >>>>> vendor specific attributes: each vendor may define different attributes >>>>> device id : device id of a physical devices or mdev's parent pci device. >>>>> it could be equal to pci id for pci devices >>>>> aggregator: used together with mdev_type. e.g. aggregator=2 together >>>>> with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel >>>>> graphics device. >>>>> remote_url: for a local NVMe VF, it may be configured with a remote >>>>> url of a remote storage and all data is stored in the >>>>> remote side specified by the remote url. >>>>> ... >>> just a minor not that i find ^ much more simmple to understand then >>> the current proposal with self and compatiable. >>> if i have well defiend attibute that i can parse and understand that allow >>> me to calulate the what is and is not compatible that is likely going to >>> more useful as you wont have to keep maintianing a list of other compatible >>> devices every time a new sku is released. >>> >>> in anycase thank for actully shareing ^ as it make it simpler to reson about what >>> you have previously proposed. >> >> So, what would be the most helpful format? A 'software_version' field >> that follows the conventions outlined above, and other (possibly >> optional) fields that have to match? > > Just to get a different perspective, I've been trying to come up with > what would be useful for a very different kind of device, namely > vfio-ccw. (Adding Eric to cc: for that.) > > software_version makes sense for everybody, so it should be a standard > attribute. > > For the vfio-ccw type, we have only one vendor driver (vfio-ccw_IO). > > Given a subchannel A, we want to make sure that subchannel B has a > reasonable chance of being compatible. I guess that means: > > - same subchannel type (I/O) > - same chpid type (e.g. all FICON; I assume there are no 'mixed' setups > -- Eric?) Correct. > - same number of chpids? Maybe we can live without that and just inject > some machine checks, I don't know. Same chpid numbers is something we > cannot guarantee, especially if we want to migrate cross-CEC (to > another machine.) I think we'd live without it, because I wouldn't expect it to be consistent between systems. > > Other possibly interesting information is not available at the > subchannel level (vfio-ccw is a subchannel driver.) I presume you're alluding to the DASD uid (dasdinfo -x) here? > > So, looking at a concrete subchannel on one of my machines, it would > look something like the following: > > > software_version=1.0.0 > type=vfio-ccw <-- would be vfio-pci on the example above > > subchannel_type=0 > > chpid_type=0x1a > chpid_mask=0xf0 <-- not sure if needed/wanted > > Does that make sense? > From alex.kavanagh at canonical.com Thu Aug 13 19:21:48 2020 From: alex.kavanagh at canonical.com (Alex Kavanagh) Date: Thu, 13 Aug 2020 20:21:48 +0100 Subject: [charms] OpenStack Charms 20.08 release is now available Message-ID: The 20.08 release of the OpenStack Charms is now available. This release brings several new features to the existing OpenStack Charms deployments for Queens, Rocky, Stein, Train, Ussuri, and many stable combinations of Ubuntu + OpenStack. Please see the Release Notes for full details: https://docs.openstack.org/charm-guide/latest/2008.html == Highlights == * New charm: neutron-api-plugin-arista There is a new supported subordinate charm that provides Arista switch ML2 plugin support to the OpenStack Neutron API service: neutron-api-plugin-arista. * New charms: Trilio The Trilio charms (trilio-data-mover, trilio-dm-api, trilio-horizon-plugin, and trilio-wlm) have been promoted to supported status. These charms deploy TrilioVault, a commercial snapshot and restore solution for OpenStack. * New charm: keystone-kerberos The keystone-kerberos subordinate charm allows for per-domain authentication via a Kerberos ticket, thereby providing an additional layer of security. It is used in conjunction with the keystone charm. * MySQL InnoDB Cluster TLS communication TLS communication between MySQL InnoDB Cluster and its cloud clients is now supported. Due to the circular dependency between the vault and mysql-innodb-cluster applications, this is a post-deployment feature. * Gnocchi S3 support The gnocchi charm can now be configured to use S3 as a storage backend. This feature is available starting with OpenStack Stein. * Charm cinder-ceph supports a new relation When both the nova-compute and cinder-ceph applications are deployed a new relation is now required. This should not affect most currently deployed clouds. * Glance Simplestreams Sync The glance-simplestreams-sync charm now installs simplestreams as a snap. As such, the 'channel' configuration option should be used in place of the ‘source’ option. == OpenStack Charms team == The OpenStack Charms team can be contacted on the #openstack-charms IRC channel on Freenode. == Thank you == Lots of thanks to the below 37 charm contributors who squashed 114 bugs*, enabled support for a new release of OpenStack, improved documentation, and added exciting new functionality! Alex Kavanagh Aurelien Lourot James Page Peter Matulis Liam Young Hervé Beraud Corey Bryant Frode Nordahl David Ames Ryan Beisner Chris MacNaughton Dmitrii Shcherbakov Drew Freiberger Edward Hope-Morley Facundo Ciccioli Andreas Jaeger Pedro Guimarães Nobuto Murata Arif Ali Felipe Reyes Ponnuvel Palaniyappan Brett Alvaro Uria Marco Filipe Moutinho da Silva Alejandro Santoyo Gonzalez Camille Rodriguez oliveiradan Tiago Pasqualini Erlon R. Cruz Trent Lloyd Nikolay Vinogradov Andrew McLeod Mauricio Faria de Oliveira Vern Hart Jeff Hillman Rodrigo Barbieri Nicolas Bock * The contributor and bug numbers are based on the OpenStack Victoria development cycle. -- OpenStack Charms Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgiusti at gmail.com Thu Aug 13 21:17:51 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Thu, 13 Aug 2020 17:17:51 -0400 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <671fec63-8bea-4215-c773-d8360e368a99@sap.com> <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> Message-ID: On Thu, Aug 13, 2020 at 12:30 PM Ben Nemec wrote: > > > On 8/13/20 11:07 AM, Sean Mooney wrote: > >> I think it's probably > >> better to provide a well-defined endpoint for them to talk to rather > >> than have everyone implement their own slightly different RPC ping > >> mechanism. The docs for this feature should be very explicit that this > >> is the only thing external code should be calling. > > ya i think that is a good approch. > > i would still prefer if people used say middelware to add a service ping > admin api endpoint > > instead of driectly calling the rpc endpoint to avoid exposing rabbitmq > but that is out of scope of this discussion. > > Completely agree. In the long run I would like to see this replaced with > better integrated healthchecking in OpenStack, but we've been talking > about that for years and have made minimal progress. > > > > >> > >>> > >>> so if this does actully detect somethign we can otherwise detect and > the use cases involves using it within > >>> the openstack services not form an external source then i think that > is fine but we proably need to use another > >>> name (alive? status?) or otherewise modify nova so that there is no > conflict. > >>>> > >> > >> If I understand your analysis of the bug correctly, this would have > >> caught that type of outage after all since the failure was asymmetric. > > am im not sure > > it might yes looking at https://review.opendev.org/#/c/735385/6 > > its not clear to me how the endpoint is invoked. is it doing a topic > send or a direct send? > > to detech the failure you would need to invoke a ping on the compute > service and that ping would > > have to been encured on the to nova topic exchante with a routing key of > compute. > > > > if the compute topic queue was broken either because it was nolonger > bound to the correct topic or due to some other > > rabbitmq error then you woudl either get a message undeilverbale error > of some kind with the mandaroy flag or likely a > > timeout without the mandaroty flag. so if the ping would be routed usign > a topic too compute. > > then yes it would find this. > > > > although we can also detech this ourselves and fix it using the > mandatory flag i think by just recreating the queue wehn > > it extis but we get an undeliverable message, at least i think we can > rabbit is not my main are of expertiese so it > > woudl be nice is someone that know more about it can weigh in on that. > > I pinged Ken this morning to take a look at that. He should be able to > tell us whether it's a good idea or crazy talk. :-) > Like I can tell the difference between crazy and good ideas. Ben I thought you knew me better. ;) As discussed you can enable the mandatory flag on a per RPCClient instance, for example: _topts = oslo_messaging.TransportOptions(at_least_once=True) client = oslo_messaging.RPCClient(self.transport, self.target, timeout=conf.timeout, version_cap=conf.target_version, transport_options=_topts).prepare() This will cause an rpc call/cast to fail if rabbitmq cannot find a queue for the rpc request message [note the difference between 'queuing the message' and 'having the message consumed' - the mandatory flag has nothing to do with whether or not the message is eventually consumed]. Keep in mind that there may be some cases where having no active consumers is ok and you do not want to get a delivery failure exception - specifically fanout or perhaps cast. Depends on the use case. If there are fanout use cases that fail or degrade if all present services don't get a message then the mandatory flag will not detect an error if a subset of the bindings are lost. My biggest concern with this type of failure (lost binding) is that apparently the consumer is none the wiser when it happens. Without some sort of event issued by rabbitmq the RPC server cannot detect this problem and take corrective actions (or at least I cannot think of any ATM). -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Thu Aug 13 21:36:10 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 17:36:10 -0400 Subject: [cinder] victoria mid-cycle part 2 summary available Message-ID: <30dce86d-4afd-f2cb-2b84-61b730c279b6@gmail.com> In case you missed yesterday's R-9 virtual mid-cycle session, I've updated the victoria mid-cycle wiki with a summary: https://wiki.openstack.org/wiki/CinderVictoriaMidCycleSummary It will eventually include a link to the recording (in case you want to see what you missed or if you want to re-live the excitement). We had a productive meeting yesterday, thanks to all who participated. Unfortunately, some people had trouble connecting to the videoconference. Please contact me off-list so we can figure out whether this was a one-time fail or if we need to look at some other videoconf solution for future meetings. cheers, brian From rosmaita.fossdev at gmail.com Thu Aug 13 21:53:44 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 17:53:44 -0400 Subject: [cinder] victoria os-brick release coming soon Message-ID: Just a quick reminder that the victoria os-brick release is 3 weeks away. Reviews may be a bit slower than usual given that people may be taking some end-of-the-summer vacation, so if you have an important patch for os-brick, please take the initiative to raise awareness in the #openstack-cinder IRC channel if it's not getting the attention it deserves. cheers, brian From rosmaita.fossdev at gmail.com Thu Aug 13 22:05:00 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 18:05:00 -0400 Subject: [cinder] victoria new feature status checkpoint next week Message-ID: <8b4f6f98-fbcb-9354-94aa-e01ef0912ebb@gmail.com> If you are working on a Cinder feature for Victoria that hasn't merged yet, please add it to the agenda for next week's cinder weekly meeting on 19 August at 1400 UTC: https://etherpad.opendev.org/p/cinder-victoria-meetings If your feature requires client support, keep in mind that the final release for client libraries is in four weeks. Any client changes must be reviewed, tested, and merged before 10 September. Keep in mind that 7 September is a holiday for many Cinder core reviewers, so it is likely that we will have reduced reviewer bandwidth around the time of the Feature Freeze. So please plan ahead. cheers, brian From rosmaita.fossdev at gmail.com Thu Aug 13 22:15:27 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 18:15:27 -0400 Subject: [cinder] driver features declaration for victoria next week Message-ID: Hello Cinder driver maintainers, This is a reminder that new features added to Cinder drivers for the Victoria release must be merged at the time of the OpenStack-wide Feature Freeze, which is coming up soon (10 September, to be specific). In order to avoid the Festival of Insane Driver Reviewing that we had last cycle, if you have un-merged driver features that you would like to land in Victoria, please post a blueprint in Launchpad listing the Gerrit reviews of the associated patches before the next Cinder weekly meeting (that is, before 19 August at 1400 UTC). This will help the team prioritize reviews and give you candid early feedback on whether the features look ready. You can look among the Ussuri blueprints for examples; contact me in IRC if you have any questions. Due to the 7 September holiday in the USA, there will be reduced reviewing bandwidth right around the Feature Freeze, so that's why I'm asking you to plan ahead. cheers, brian From rosmaita.fossdev at gmail.com Thu Aug 13 22:45:08 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 18:45:08 -0400 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> <20200813164131.bdmhankpd2qxycux@yuggoth.org> Message-ID: <2956d6bd-320e-34ea-64a0-1001e102d75c@gmail.com> On 8/13/20 1:03 PM, Artom Lifshitz wrote: > On Thu, Aug 13, 2020 at 12:45 PM Jeremy Stanley wrote: >> >> On 2020-08-13 17:27:16 +0100 (+0100), Erno Kuvaja wrote: >> [...] >>> My question at this point is, do we (as a community) have enough >>> bodies dedicated to OSSDK _and_ OSC to make this sustainable? I'm >>> being sincere here as I have not been part of the development of >>> either of those projects. But if my assumption above is correct, I >>> think we should talk about these things with their real names >>> rather than trying to mask this being just OSC vs python-*client >>> CLI thing. >> >> Hopefully this doesn't come across as a glib response, but if people >> didn't have to maintain multiple CLIs and SDKs then maybe they would >> have enough time to collaborate on a universal CLI/SDK pair instead. > > Agreed - but historically that's not what happened, so the question > now is how to improve the situation. My understanding is that osc is > effectively dead, except as a shell around the sdk, since that's where > the future lies. So in my mind, efforts should be concentrated on two > fronts: > > 1. Continue converting osc to use the sdk > 2. Catch up the SDK My understanding is that the SDK is supposed to be an opinionated entry point to the APIs? Or am I thinking of some other project? I'm bringing this up because people say they want a single unified CLI, but when I've pushed operators about this, they want a CLI in Victoria that implements all the admin operations exposed by the Victorian-era APIs. A CLI built on an opinionated SDK is not going to do that. I could use some clarification on the goal and strategy here. If it's to provide a unified opinionated CLI, then I don't see how that helps us to eventually eliminate the project-specific CLIs. And if it's to provide one CLI that rules them all, the individual projects (well, Cinder, anyway) can't stop adding functionality to cinderclient CLI until the openstackclient CLI has feature parity. At least now, you can use one CLI to do all cinder-related stuff. If we stop cinderclient CLI development, then you'll need to use openstackclient for some things (old features + the latest features) and the cinderclient for all the in between features, which doesn't seem like progress to me. Thus it would be helpful to have some clarification about the nature of the proposal we're discussing. > > This is a bit of a chicken and egg problem, because any gaps in sdk > mean you can't convert osc to use those missing bits, but ideally any > patches to osc that aren't sdk conversions would get blocked (though I > have obviously absolutely no say in the matter, this is just wishful > thinking). The project teams can work on 2 for their project (so like > I've been slowly doing for Nova), the osc team can work on 1. > > > >> -- >> Jeremy Stanley > > From rosmaita.fossdev at gmail.com Thu Aug 13 22:51:34 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 13 Aug 2020 18:51:34 -0400 Subject: [cinder] driver maintainers: 3rd party CI checkpoint reminder Message-ID: <71c6bc06-06a0-6957-1755-063e76c57b2f@gmail.com> Hello Cinder driver maintainers, Around the time of the Feature Freeze (10 September), the Cinder team will be looking at the Third Party CIs to assess compliance [0]. Out of compliance drivers will be marked as 'unsupported' in the Victoria release. We can avoid a lot of unpleasantness if you take this opportunity to review the current situation of your driver's CI [1] and, if necessary, take appropriate steps to get it back into compliance before 10 September. cheers, brian [0] https://docs.openstack.org/cinder/latest/drivers-all-about.html#driver-compliance [1] http://cinderstats.ivehearditbothways.com/cireport.txt From fungi at yuggoth.org Thu Aug 13 23:02:50 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 13 Aug 2020 23:02:50 +0000 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: <2956d6bd-320e-34ea-64a0-1001e102d75c@gmail.com> References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> <20200813164131.bdmhankpd2qxycux@yuggoth.org> <2956d6bd-320e-34ea-64a0-1001e102d75c@gmail.com> Message-ID: <20200813230250.63rbvs4xaznpcejd@yuggoth.org> On 2020-08-13 18:45:08 -0400 (-0400), Brian Rosmaita wrote: [...] > My understanding is that the SDK is supposed to be an opinionated > entry point to the APIs? Or am I thinking of some other project? [...] It's modelled as several layers: direct REST API access, functional access (similar to what our classic python-*client libs provided), and an opinionated layer with more business logic and plaster over cloud-specific interoperability problems (formerly the Shade library which grew out of Nodepool). Callers can mix-n-match the layers, like use a higher level call to get a Keystone token and then use it to authenticate REST API methods. https://docs.openstack.org/openstacksdk/latest/user/index.html#api-documentation -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From zbitter at redhat.com Fri Aug 14 00:37:36 2020 From: zbitter at redhat.com (Zane Bitter) Date: Thu, 13 Aug 2020 20:37:36 -0400 Subject: [Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found In-Reply-To: References: <7fe6626a-0abb-97ca-fbfb-2066f426b9bf@redhat.com> Message-ID: On 24/07/20 10:59 am, Laurent Dumont wrote: > Hey Zane, > > Thank you so much for the details - super interesting. We've worked with > the Vendor to try and reproduce while we had our logs for Heat turned to > DEBUG. Unfortunately, all of the creations they have attempted since > have worked. It first failed 4 times out of 5 and has since worked... Interesting - sounds like a timing issue, but I haven't spotted any code that looks like it could fail by going too fast. > It's one of those problems! We'll keep trying to reproduce. Just to be > sure, the actual yaml is stored in the DB and then accessed to create > the actual Heat ressources? Yep, correct. It's stored and the ID is passed in the RPC message here: https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L308 https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L372-L374 https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L336-L337 and then when the other engine receives the create_stack RPC message it uses the stored template instead of one passed in the message like you would get from a create call initiated via the ReST API: https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#L847-L851 https://opendev.org/openstack/heat/src/branch/master/heat/engine/service.py#L731-L732 - ZB > > Thanks! > > On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter > wrote: > > On 21/07/20 8:03 pm, Laurent Dumont wrote: > > Hi! > > > > We are currently troubleshooting a Heat stack issue where one of the > > stack (one of 25 or so) is failing to be created properly (seemingly > > randomly). > > > > The actual error returned by Heat is quite strange and Google has > been > > quite sparse in terms of references. > > > > The actual error looks like the following (I've sanitized some of > the > > names): > > > > Resource CREATE failed: resources.potato: Resource CREATE failed: > > resources[0]: raw template with id 22273 not found > > When creating a nested stack, rather than just calling the RPC > method to > create a new stack, Heat stores the template in the database first and > passes the ID in the RPC message.[1] (It turns out that by doing it > this > way we can save massive amounts of memory when processing a large tree > of nested stacks.) My best guess is that this message indicates that > the > template row has been deleted by the time the other engine goes to look > at it. > > I don't see how you could have got an ID like 22273 without the > template > having been successfully stored at some point. > > The template is only supposed to be deleted if the RPC call returns > with > an error.[2] The only way I can think of for that to happen before an > attempt to create the child stack is if the RPC call times out, but the > original message is eventually picked up by an engine. I would check > your logs for RPC timeouts and consider increasing them. > > What does the status_reason look like at one level above in the tree? > That should indicate the first error that caused the template to be > deleted. > > >     heat resource-list STACK_NAME_HERE -n 50 > > >  +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > >     | resource_name    | physical_resource_id                 | > >     resource_type           | resource_status | updated_time >     | > >     stack_name > >          | > > >  +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > >     | potato              | RESOURCE_ID_HERE | > OS::Heat::ResourceGroup | > >     CREATE_FAILED   | 2020-07-18 T19:52:10Z | > >     nested_stack_1_STACK_NAME_HERE                  | > >     | potato_server_group | RESOURCE_ID_HERE | > OS::Nova::ServerGroup   | > >     CREATE_COMPLETE | 2020-07-21T19:52:10Z | > >     nested_stack_1_STACK_NAME_HERE                  | > >     | 0                |                                      | > >     potato1.yaml     | CREATE_FAILED   | 2020-07-18T19:52:12Z | > >     nested_stack_2_STACK_NAME_HERE | > >     | 1                |                                      | > >     potato1.yaml     | INIT_COMPLETE   | 2020-07- 18 T19:52:12Z | > >     nested_stack_2_STACK_NAME_HERE | > > >  +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > > > > > The template itself is pretty simple and attempts to create a > > ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling > is that > > one the creation of those machines fails and Heat get's a little > cooky > > and returns an error that might not be the actual root cause. I > would > > have expected the VM to show up in the resource list but I just > see the > > source "yaml". > > It's clear from the above output that the scaled unit of the resource > group is in fact a template (not an OS::Nova::Server), and the error is > occurring trying to create a stack from that template (potato1.yaml) - > before Heat even has a chance to start creating the server. > > > Has anyone seen something similar in the past? > > Nope. > > cheers, > Zane. > > [1] > https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384 > [2] > https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342 > > From radoslaw.piliszek at gmail.com Fri Aug 14 07:50:55 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 14 Aug 2020 09:50:55 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: Hi, it's been a month since I wrote the original (quoted) email, so I retry it with CC to the PTL and a recently (this year) active core. I see there have been no meetings and neither Masakari IRC channel nor review queues have been getting much attention during that time period. I am, therefore, offering my help to maintain the project. Regarding the original topic, I would opt for running Masakari meetings during the time I proposed so that interested parties could join and I know there is at least some interest based on recent IRC activity (i.e. there exist people who want to use and discuss Masakari - apart from me that is :-) ). -yoctozepto On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek wrote: > > Hello Fellow cloud-HA-seekers, > > I wanted to attend Masakari meetings but I found the current schedule unfit. > Is there a chance to change the schedule? The day is fine but a shift > by +3 hours would be nice. > > Anyhow, I wanted to discuss [1]. I've already proposed a change > implementing it and looking forward to positive reviews. :-) That > said, please reply on the change directly, or mail me or catch me on > IRC, whichever option sounds best to you. > > [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > -yoctozepto From yan.y.zhao at intel.com Fri Aug 14 05:16:01 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Fri, 14 Aug 2020 13:16:01 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> Message-ID: <20200814051601.GD15344@joy-OptiPlex-7040> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > driver is it handled by? > > It looks that the devlink is for network device specific, and in > > devlink.h, it says > > include/uapi/linux/devlink.h - Network physical device Netlink > > interface, > > > Actually not, I think there used to have some discussion last year and the > conclusion is to remove this comment. > > It supports IB and probably vDPA in the future. > hmm... sorry, I didn't find the referred discussion. only below discussion regarding to why to add devlink. https://www.mail-archive.com/netdev at vger.kernel.org/msg95801.html >This doesn't seem to be too much related to networking? Why can't something >like this be in sysfs? It is related to networking quite bit. There has been couple of iteration of this, including sysfs and configfs implementations. There has been a consensus reached that this should be done by netlink. I believe netlink is really the best for this purpose. Sysfs is not a good idea https://www.mail-archive.com/netdev at vger.kernel.org/msg96102.html >there is already a way to change eth/ib via >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1 > >sounds like this is another way to achieve the same? It is. However the current way is driver-specific, not correct. For mlx5, we need the same, it cannot be done in this way. Do devlink is the correct way to go. https://lwn.net/Articles/674867/ There a is need for some userspace API that would allow to expose things that are not directly related to any device class like net_device of ib_device, but rather chip-wide/switch-ASIC-wide stuff. Use cases: 1) get/set of port type (Ethernet/InfiniBand) 2) monitoring of hardware messages to and from chip 3) setting up port splitters - split port into multiple ones and squash again, enables usage of splitter cable 4) setting up shared buffers - shared among multiple ports within one chip we actually can also retrieve the same information through sysfs, .e.g |- [path to device] |--- migration | |--- self | | |---device_api | | |---mdev_type | | |---software_version | | |---device_id | | |---aggregator | |--- compatible | | |---device_api | | |---mdev_type | | |---software_version | | |---device_id | | |---aggregator > > > I feel like it's not very appropriate for a GPU driver to use > > this interface. Is that right? > > > I think not though most of the users are switch or ethernet devices. It > doesn't prevent you from inventing new abstractions. so need to patch devlink core and the userspace devlink tool? e.g. devlink migration > Note that devlink is based on netlink, netlink has been widely used by > various subsystems other than networking. the advantage of netlink I see is that it can monitor device status and notify upper layer that migration database needs to get updated. But not sure whether openstack would like to use this capability. As Sean said, it's heavy for openstack. it's heavy for vendor driver as well :) And devlink monitor now listens the notification and dumps the state changes. If we want to use it, need to let it forward the notification and dumped info to openstack, right? Thanks Yan From pierre at stackhpc.com Fri Aug 14 07:56:42 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 14 Aug 2020 09:56:42 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: You may also want to try contacting suzhengwei (https://launchpad.net/~sue.sam), we had a discussion in June about potential integration between Masakari and Blazar. On Fri, 14 Aug 2020 at 09:52, Radosław Piliszek wrote: > > Hi, > > it's been a month since I wrote the original (quoted) email, so I > retry it with CC to the PTL and a recently (this year) active core. > > I see there have been no meetings and neither Masakari IRC channel nor > review queues have been getting much attention during that time > period. > I am, therefore, offering my help to maintain the project. > > Regarding the original topic, I would opt for running Masakari > meetings during the time I proposed so that interested parties could > join and I know there is at least some interest based on recent IRC > activity (i.e. there exist people who want to use and discuss Masakari > - apart from me that is :-) ). > > -yoctozepto > > > On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek > wrote: > > > > Hello Fellow cloud-HA-seekers, > > > > I wanted to attend Masakari meetings but I found the current schedule unfit. > > Is there a chance to change the schedule? The day is fine but a shift > > by +3 hours would be nice. > > > > Anyhow, I wanted to discuss [1]. I've already proposed a change > > implementing it and looking forward to positive reviews. :-) That > > said, please reply on the change directly, or mail me or catch me on > > IRC, whichever option sounds best to you. > > > > [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > > > -yoctozepto > From alexander.dibbo at stfc.ac.uk Fri Aug 14 10:49:34 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 14 Aug 2020 10:49:34 +0000 Subject: Issue with heat and magnum Message-ID: <08439410328b4d1ab7ca684d5af2c7c7@stfc.ac.uk> Hi, I am having an issue with magnum creating clusters when I have multiple active heat-engine daemons running. I get the following error in the heat engine logs: 2020-08-14 10:36:30.237 598383 INFO heat.engine.resource [req-a2c862eb-370c-4e91-a2c6-dca32c7872ce - - - - -] signal SoftwareDeployment "master_config_deployment" [67ba9ce2-aba5-4c15-a7ea -6b774659a0e2] Stack "kubernetes-test-26-3uzjqqob47fh-kube_masters-mhctjio2b4gh-0-pbhumflm5mn5" [dc66e4d9-0c9b-4b18-a2c6-dd9724fa51a9] : Authentication cannot be scoped to multiple target s. Pick one of: project, domain, trust or unscoped 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource Traceback (most recent call last): 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2462, in _handle_signal 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource signal_result = self.handle_signal(details) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/software_deployment.py", line 514, in handle_signal 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource timeutils.utcnow().isoformat()) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/rpc/client.py", line 788, in signal_software_deployment 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource version='1.6') 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/rpc/client.py", line 89, in call 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource return client.call(ctxt, method, **kwargs) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 165, in call 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource msg_ctxt = self.serializer.serialize_context(ctxt) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/common/messaging.py", line 46, in serialize_context 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource _context = ctxt.to_dict() 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 185, in to_dict 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource 'roles': self.roles, 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 315, in roles 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource self._load_keystone_data() 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 292, in wrapped_f 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource return self.call(f, *args, **kw) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 358, in call 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource do = self.iter(retry_state=retry_state) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 319, in iter 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource return fut.result() 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource return self.__get_result() 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 361, in call 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource result = fn(*args, **kwargs) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 306, in _load_keystone_data 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource auth_ref = self.auth_plugin.get_access(self.keystone_session) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/keystoneauth1/identity/base.py", line 134, in get_access 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource self.auth_ref = self.get_auth_ref(session) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/keystoneauth1/identity/generic/base.py", line 208, in get_auth_ref 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource return self._plugin.get_auth_ref(session, **kwargs) 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/keystoneauth1/identity/v3/base.py", line 144, in get_auth_ref 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource message='Authentication cannot be scoped to multiple' 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource AuthorizationFailure: Authentication cannot be scoped to multiple targets. Pick one of: project, domain, trust or unscoped 2020-08-14 10:36:30.237 598383 ERROR heat.engine.resource 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service [req-a2c862eb-370c-4e91-a2c6-dca32c7872ce - - - - -] Unhandled error in asynchronous task: ResourceFailure: AuthorizationFailure: resources.master_config_deployment: Authentication cannot be scoped to multiple targets. Pick one of: project, domain, trust or unscoped 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service Traceback (most recent call last): 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 132, in log_exceptions 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service gt.wait() 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 181, in wait 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service return self._exit_event.wait() 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 132, in wait 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service current.throw(*self._exc) 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 221, in main 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service result = function(*args, **kwargs) 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 123, in _start_with_trace 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service return func(*args, **kwargs) 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1871, in _resource_signal 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service needs_metadata_updates = rsrc.signal(details, need_check) 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2500, in signal 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service self._handle_signal(details) 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2480, in _handle_signal 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service raise failure 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service ResourceFailure: AuthorizationFailure: resources.master_config_deployment: Authentication cannot be scoped to multiple targets. Pi ck one of: project, domain, trust or unscoped 2020-08-14 10:36:30.890 598383 ERROR heat.engine.service Each of the individual heat-engine daemons create magnum clusters correctly when they are the only ones online. Attached are the heat and magnum config files. Any ideas where to look would be appreciated? Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: heat.conf.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: magnum.conf.txt URL: From dev.faz at gmail.com Fri Aug 14 11:21:04 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 14 Aug 2020 13:21:04 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Hello again, just a short update about the results of my tests. I currently see 2 ways of running openstack+rabbitmq 1. without durable-queues and without replication - just one rabbitmq-process which gets (somehow) restarted if it fails. 2. durable-queues and replication Any other combination of these settings leads to more or less issues with * broken / non working bindings * broken queues I think vexxhost is running (1) with their openstack-operator - for reasons. I added [kolla], because kolla-ansible is installing rabbitmq with replication but without durable-queues. May someone point me to the best way to document these findings to some official doc? I think a lot of installations out there will run into issues if - under load - a node fails. Fabian Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < dev.faz at gmail.com>: > Hi, > > just did some short tests today in our test-environment (without durable > queues and without replication): > > * started a rally task to generate some load > * kill-9-ed rabbitmq on one node > * rally task immediately stopped and the cloud (mostly) stopped working > > after some debugging i found (again) exchanges which had bindings to > queues, but these bindings didnt forward any msgs. > Wrote a small script to detect these broken bindings and will now check if > this is "reproducible" > > then I will try "durable queues" and "durable queues with replication" to > see if this helps. Even if I would expect > rabbitmq should be able to handle this without these "hidden broken > bindings" > > This just FYI. > > Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From its-openstack at zohocorp.com Fri Aug 14 11:42:12 2020 From: its-openstack at zohocorp.com (its-openstack at zohocorp.com) Date: Fri, 14 Aug 2020 17:12:12 +0530 Subject: Openstack-Train VCPU issue in Hyper-V Message-ID: <173ecc7045a.1134ca19a23846.8868151533455235252@zohocorp.com> Dear Team,    We are using Openstack-Train in our organization.We have created windows server 2016 Std R2 instances with this flavor m5.xlarge ( RAM - 65536 , Disk - 500 , VCPUs - 16 ).Once Hyper-V future enabled in this instances VCPU count is automatically reduced to 1 core after restart.Even we have enabled nested virtualisation in openstack compute server.Please help us to short out this issue. #cat /sys/module/kvm_intel/parameters/nested Y Regards, Sysadmin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Aug 14 12:46:39 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 14 Aug 2020 13:46:39 +0100 Subject: Openstack-Train VCPU issue in Hyper-V In-Reply-To: <173ecc7045a.1134ca19a23846.8868151533455235252@zohocorp.com> References: <173ecc7045a.1134ca19a23846.8868151533455235252@zohocorp.com> Message-ID: <98557b2765564577d5305ace4bff195777f7c857.camel@redhat.com> On Fri, 2020-08-14 at 17:12 +0530, its-openstack at zohocorp.com wrote: > Dear Team, > > > > We are using Openstack-Train in our organization.We have created windows server 2016 Std R2 instances with this > flavor m5.xlarge ( RAM - 65536 , Disk - 500 , VCPUs - 16 ).Once Hyper-V future enabled in this instances VCPU count is > automatically reduced to 1 core after restart.Even we have enabled nested virtualisation in openstack compute server. just to confirm you are using the hyperv driver? if so then this sound like a hyperv bug not an openstack bug. have you reached out to microsfot for support with this issue. openstack itself does not garentee neted virt will be avaiable or work and doe not guarentee that it will work across operationg systems. > Please help us to short out this issue. im off today so i wont be monitoring this i just saw your email while i was doing something else but without more info of what your configurtion is and how this is failing i dont think people will be able to help you root cause your issue. the other thing to be aware is that this is not a support list, peopel might have time to help and often do try to help but outside of there good nature if you have an issue you can resolve your self withpointer form the comunity you might need to reach out to your openstack vendor for support or if you dont have one engage one of your engeiner to work with the upstream comunity to rootcause and fix the issue. there is not vendor customer support relationship betten upstream and those that install it. the list acks as a way for people that develop and use openstack to help each other voluntarily > > #cat /sys/module/kvm_intel/parameters/nested > > Y are you setting this on the host if so that implies you are using the libvirt driver and instead rung windows server as a guest and trying to enable hyperv on a windwos guest. that is not whoe your email initally reads and is a differnt part of teh code base. when you say you enable the Hyper-V future is that in the windows os on the host or in a windows os on a vm hosted on a linux host. i dont know of any way that that could alter the vcpu allocated to the vm. if you are using the libvirt dirver can you provide the xml before and after you enable the hyperv feature and reboot if they are still the same then this is a windows kernel bug. if you are not enabling the hyperv featre in the windows os and are instead refering to modifying the libvirt xml to add hyperv feature flags to the gust that is not supported. you are not allowed to ever modify a nova crated guest xml. the wya to enable the hyperv enlightlment is to set metadata on the glance image to delcare the image as a windows image and then nova will enabel the hyperv enlightement feature flags in the xml. > > > > Regards, > > Sysadmin. From sean.mcginnis at gmx.com Fri Aug 14 12:56:38 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 14 Aug 2020 07:56:38 -0500 Subject: [all][TC] OpenStack Client (OSC) vs python-*clients In-Reply-To: <2956d6bd-320e-34ea-64a0-1001e102d75c@gmail.com> References: <1668118.VLH7GnMWUR@whitebase.usersys.redhat.com> <9cbf9d69a9beb30d03af71e42a3e2446a516292a.camel@redhat.com> <20200813164131.bdmhankpd2qxycux@yuggoth.org> <2956d6bd-320e-34ea-64a0-1001e102d75c@gmail.com> Message-ID: > And if it's to provide one CLI that rules them all, the individual > projects (well, Cinder, anyway) can't stop adding functionality to > cinderclient CLI until the openstackclient CLI has feature parity.  At > least now, you can use one CLI to do all cinder-related stuff.  If we > stop cinderclient CLI development, then you'll need to use > openstackclient for some things (old features + the latest features) > and the cinderclient for all the in between features, which doesn't > seem like progress to me. And in reality, I don't think Cinder can even drop cinderclient even if we get feature parity. We have python-brick-cinderclient-ext that is used in conjunction with python-cinderclient for some standalone use cases. From marino.mrc at gmail.com Fri Aug 14 13:17:50 2020 From: marino.mrc at gmail.com (Marco Marino) Date: Fri, 14 Aug 2020 15:17:50 +0200 Subject: [tripleo] Specify different interface name in single nic vlans without external network installation Message-ID: Hi, I'm trying to install openstack using tripleo on preprovisioned servers. My (desired) environment is quite simple: 1 controller and 1 compute node. Here is what I did: - Installed undercloud with 192.168.25.0/24 as a ctlplan subnet. local_ip = 192.168.25.2; undercloud_public_host=192.168.25.4; undercloud_admin_host = 192.168.25.3 - Installed 2 servers (compute and controller) with centos 8. Hardware requirements are satisfied (32 GB of ram, 100GB disk....) - Manually created user stack on compute and controller and added it to the sudoers list. - Manually Installed openstack repositories (sudo -E tripleo-repos -b ussuri current-tripleo-rdo) - Manually installed openstack required openstack packages: sudo yum install python3-heat-agent* -y Now I'd like to use 192.168.25.0/24 as "installation network" (network used by ansible) and I'm trying to configure one single nic with vlans. Please note that on all servers (undercloud included) I have 2 physical interfaces with the same name: enp1s0 and enp7s0. enp1s0 is used for a completely openstack-detached network: 192.168.2.0/24 and enp7s0 is used for 192.168.25.0/24 More precisely: Controller-0 = 192.168.25.10 Compute-0 = 192.168.25.20 Furthermore, I confirm that I can reach from both nodes using ping and curl (on port 8004). And I added 2 lines in /etc/hosts on undercloud: 192.168.25.10 controller-0 192.168.25.20 compute-0 Now I'm really confused about what to do. I tried with: openstack overcloud deploy --templates --disable-validations -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e templates/network-environment-overrides.yaml -e templates/ctlplane-assignment.yaml -e templates/nameservers.yaml -e templates/node-info.yaml -e templates/hostnamemap.yaml -n templates/network_data.yaml node-info.yaml: http://paste.openstack.org/show/796840/ hostnamemap.yaml: http://paste.openstack.org/show/796841/ network-environment-overrides.yaml: http://paste.openstack.org/show/796842/ ctlplane-assignment.yaml: http://paste.openstack.org/show/796843/ nameservers.yaml: http://paste.openstack.org/show/796844/ network_data.yaml: http://paste.openstack.org/show/796845/ How can I specify nic2 for single nic vlans without external network? Please can you provide an example with "openstack overcloud deploy" complete command? I'm reading documentation but I do not understand how to do and it's really frustrating. Thank you, Marco -------------- next part -------------- An HTML attachment was scrubbed... URL: From alterriu at gmail.com Fri Aug 14 14:13:23 2020 From: alterriu at gmail.com (Popoi Zen) Date: Fri, 14 Aug 2020 21:13:23 +0700 Subject: [neutron] How to specify overlay network interface when using OVN and Geneve? Message-ID: Hi, I have used my google fu but I cant find any reference. Just want to know how to specify overlay network when Im using geneve as my overlay protocol? -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Fri Aug 14 14:23:44 2020 From: zigo at debian.org (Thomas Goirand) Date: Fri, 14 Aug 2020 16:23:44 +0200 Subject: [neutron] Implementing BGP over network:routed for IPv6 in Neutron, with DVR capabilities Message-ID: <45544000-52dd-2f05-1c18-235b495d62de@debian.org> Hi, When these patches are approved: https://review.opendev.org/486450 https://review.opendev.org/669395 we will effectively have BGP announcing for floating IPs and router gateways, with a provider network as next BGP HOP. I tested in experimentally, and it does work. There's more work to be done on it to make it better (like, eliminating GARP requests and getting neutron-dynamic-routing to know when a floating moves from one segment to another), as seen in the commends of #669395, but it works. Now, I'd like to have the same feature for IPv6. Having a segmented IPv6 L2 network already works, though isn't this always going through the network nodes still? I see no reason why IPv6 would always go through network nodes, and I would like to eliminate this SPOF. Has anyone worked on this? Or is there anyone with some advice on how to start? Is there some blueprints somewhere? I'm not sure what this implies, and where to start my research on this. But I really would love, moving forward, to have such a feature. Would anyone (try to) contribute this with me? Cheers, Thomas Goirand (zigo) From smooney at redhat.com Fri Aug 14 12:30:00 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 14 Aug 2020 13:30:00 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200814051601.GD15344@joy-OptiPlex-7040> References: <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> Message-ID: On Fri, 2020-08-14 at 13:16 +0800, Yan Zhao wrote: > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > driver is it handled by? > > > > > > It looks that the devlink is for network device specific, and in > > > devlink.h, it says > > > include/uapi/linux/devlink.h - Network physical device Netlink > > > interface, > > > > > > Actually not, I think there used to have some discussion last year and the > > conclusion is to remove this comment. > > > > It supports IB and probably vDPA in the future. > > > > hmm... sorry, I didn't find the referred discussion. only below discussion > regarding to why to add devlink. > > https://www.mail-archive.com/netdev at vger.kernel.org/msg95801.html > >This doesn't seem to be too much related to networking? Why can't something > >like this be in sysfs? > > It is related to networking quite bit. There has been couple of > iteration of this, including sysfs and configfs implementations. There > has been a consensus reached that this should be done by netlink. I > believe netlink is really the best for this purpose. Sysfs is not a good > idea > > https://www.mail-archive.com/netdev at vger.kernel.org/msg96102.html > >there is already a way to change eth/ib via > >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1 > > > >sounds like this is another way to achieve the same? > > It is. However the current way is driver-specific, not correct. > For mlx5, we need the same, it cannot be done in this way. Do devlink is > the correct way to go. im not sure i agree with that. standardising a filesystem based api that is used across all vendors is also a valid option. that said if devlink is the right choice form a kerenl perspective by all means use it but i have not heard a convincing argument for why it actually better. with tthat said we have been uing tools like ethtool to manage aspect of nics for decades so its not that strange an idea to use a tool and binary protocoal rather then a text based interface for this but there are advantages to both approches. > > https://lwn.net/Articles/674867/ > There a is need for some userspace API that would allow to expose things > that are not directly related to any device class like net_device of > ib_device, but rather chip-wide/switch-ASIC-wide stuff. > > Use cases: > 1) get/set of port type (Ethernet/InfiniBand) > 2) monitoring of hardware messages to and from chip > 3) setting up port splitters - split port into multiple ones and squash again, > enables usage of splitter cable > 4) setting up shared buffers - shared among multiple ports within one chip > > > > we actually can also retrieve the same information through sysfs, .e.g > > > - [path to device] > > |--- migration > | |--- self > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > | |--- compatible > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > > > > > > > > I feel like it's not very appropriate for a GPU driver to use > > > this interface. Is that right? > > > > > > I think not though most of the users are switch or ethernet devices. It > > doesn't prevent you from inventing new abstractions. > > so need to patch devlink core and the userspace devlink tool? > e.g. devlink migration and devlink python libs if openstack was to use it directly. we do have caes where we just frok a process and execaute a comannd in a shell with or without elevated privladge but we really dont like doing that due to the performacne impacat and security implciations so where we can use python bindign over c apis we do. pyroute2 is the only python lib i know off of the top of my head that support devlink so we would need to enhacne it to support this new devlink api. there may be otherss i have not really looked in the past since we dont need to use devlink at all today. > > > Note that devlink is based on netlink, netlink has been widely used by > > various subsystems other than networking. > > the advantage of netlink I see is that it can monitor device status and > notify upper layer that migration database needs to get updated. > But not sure whether openstack would like to use this capability. > As Sean said, it's heavy for openstack. it's heavy for vendor driver > as well :) > > And devlink monitor now listens the notification and dumps the state > changes. If we want to use it, need to let it forward the notification > and dumped info to openstack, right? i dont think we would use direct devlink monitoring in nova even if it was avaiable. we could but we already poll libvirt and the system for other resouce periodicly. we likely wouldl just add monitoriv via devlink to that periodic task. we certenly would not use it to detect a migration or a need to update a migration database(not sure what that is) in reality if we can consume this info indirectly via a libvirt api that will be the appcoh we will take at least for the libvirt driver in nova. for cyborg they may take a different appoch. we already use pyroute2 in 2 projects, os-vif and neutron and it does have devlink support so the burden of using devlink is not that high for openstack but its a less frineadly interface for configuration tools like ansiable vs a filesystem based approch. > > Thanks > Yan > From satish.txt at gmail.com Fri Aug 14 14:59:28 2020 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 14 Aug 2020 10:59:28 -0400 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Fabian, what do you mean? >> I think vexxhost is running (1) with their openstack-operator - for reasons. On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann wrote: > > Hello again, > > just a short update about the results of my tests. > > I currently see 2 ways of running openstack+rabbitmq > > 1. without durable-queues and without replication - just one rabbitmq-process which gets (somehow) restarted if it fails. > 2. durable-queues and replication > > Any other combination of these settings leads to more or less issues with > > * broken / non working bindings > * broken queues > > I think vexxhost is running (1) with their openstack-operator - for reasons. > > I added [kolla], because kolla-ansible is installing rabbitmq with replication but without durable-queues. > > May someone point me to the best way to document these findings to some official doc? > I think a lot of installations out there will run into issues if - under load - a node fails. > > Fabian > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann : >> >> Hi, >> >> just did some short tests today in our test-environment (without durable queues and without replication): >> >> * started a rally task to generate some load >> * kill-9-ed rabbitmq on one node >> * rally task immediately stopped and the cloud (mostly) stopped working >> >> after some debugging i found (again) exchanges which had bindings to queues, but these bindings didnt forward any msgs. >> Wrote a small script to detect these broken bindings and will now check if this is "reproducible" >> >> then I will try "durable queues" and "durable queues with replication" to see if this helps. Even if I would expect >> rabbitmq should be able to handle this without these "hidden broken bindings" >> >> This just FYI. >> >> Fabian From samuel.mutel at gmail.com Fri Aug 14 15:18:31 2020 From: samuel.mutel at gmail.com (Samuel Mutel) Date: Fri, 14 Aug 2020 17:18:31 +0200 Subject: [Telemetry] Error when sending to prometheus pushgateway In-Reply-To: References: <731c90df-8830-1804-10a8-a9a97a3e2f55@matthias-runge.de> Message-ID: Hello, I didn't find the issue. Somebody could help me ? Thanks. Le mer. 8 juil. 2020 à 17:55, Samuel Mutel a écrit : > Hello, > > Thanks for your help. I tried to test the pushgateway manually and it > seems to work fine. The pushgateway wrote some things on the stdout. > But when I start the ceilometer, nothing happens. I tried to change the IP > to use 127.0.0.1 but nothing. > > Here is my ceilometer.conf: > >> [DEFAULT] >> auth_strategy = keystone >> debug = False >> event_dispatchers = gnocchi >> meter_dispatchers = gnocchi >> transport_url = rabbit://openstack:xxxxxx at xx.xx.x.xx >> ,openstack:xxxxxxx at xx.xx.x.xx,openstack:xxxxxxxxx at xx.xx.x.xx/ >> >> [cache] >> backend = dogpile.cache.memcached >> enabled = True >> memcache_servers = xx.xx.x.xx:11211,xx.xx.x.xx:11211,xx.xx.x.xx:11211 >> >> [keystone_authtoken] >> auth_type = password >> auth_uri = https://xxxxxxxxxxxx:5000/v3 >> auth_url = https://xxxxxxxxxxxx:5000 >> memcached_servers = xx.xx.x.xx:11211,xx.xx.x.xx:11211,xx.xx.x.xx:11211 >> password = xxxxxx >> project_domain_id = default >> project_name = service >> region_name = RegionOne >> user_domain_id = default >> username = ceilometer >> www_authenticate_uri = https://xxxxxxxxxxxx:5000 >> >> [notification] >> pipelines = meter >> >> [oslo_messaging_notifications] >> driver = messagingv2 >> >> [oslo_middleware] >> enable_proxy_headers_parsing = True >> >> [publisher] >> telemetry_secret = xxxxxxxxx >> >> [service_credentials] >> auth_type = password >> auth_url =https://xxxxxxxxxxxx:5000 >> password = xxxxxxxxx >> project_domain_id = default >> project_name = service >> region_name = RegionOne >> user_domain_id = default >> username = ceilometer >> > > Here is my event_pipeline.yaml: > >> sources: >> - name: meter_file >> events: >> - "*" >> sinks: >> - prometheus >> >> sinks: >> - name: prometheus >> publishers: >> - prometheus://127.0.0.1:9091/metrics/job/ceilometer >> > > Here is my pipeline.yaml: > >> sources: >> - name: meter_file >> interval: 30 >> meters: >> - "*" >> sinks: >> - prometheus >> >> sinks: >> - name: prometheus >> publishers: >> - prometheus://127.0.0.1:9091/metrics/job/ceilometer >> > > Here is my polling.yaml: > >> --- >> sources: >> - name: some_pollsters >> interval: 300 >> meters: >> - cpu >> - cpu_l3_cache >> - memory.usage >> - network.incoming.bytes >> - network.incoming.packets >> - network.outgoing.bytes >> - network.outgoing.packets >> - disk.device.read.bytes >> - disk.device.read.requests >> - disk.device.write.bytes >> - disk.device.write.requests >> - hardware.cpu.util >> - hardware.memory.used >> - hardware.memory.total >> - hardware.memory.buffer >> - hardware.memory.cached >> - hardware.memory.swap.avail >> - hardware.memory.swap.total >> - hardware.system_stats.io.outgoing.blocks >> - hardware.system_stats.io.incoming.blocks >> - hardware.network.ip.incoming.datagrams >> - hardware.network.ip.outgoing.datagrams >> > > Here is my ceilometer-rootwrap: > >> # Configuration for ceilometer-rootwrap >> # This file should be owned by (and only-writeable by) the root user >> >> [DEFAULT] >> # List of directories to load filter definitions from (separated by ','). >> # These directories MUST all be only writeable by root ! >> filters_path=/etc/ceilometer/rootwrap.d,/usr/share/ceilometer/rootwrap >> >> # List of directories to search executables in, in case filters do not >> # explicitely specify a full path (separated by ',') >> # If not specified, defaults to system PATH environment variable. >> # These directories MUST all be only writeable by root ! >> exec_dirs=/sbin,/usr/sbin,/bin,/usr/bin,/usr/local/sbin,/usr/local/bin >> >> # Enable logging to syslog >> # Default value is False >> use_syslog=False >> >> # Which syslog facility to use. >> # Valid values include auth, authpriv, syslog, user0, user1... >> # Default value is 'syslog' >> syslog_log_facility=syslog >> >> # Which messages to log. >> # INFO means log all usage >> # ERROR means only log unsuccessful attempts >> syslog_log_level=ERROR >> > > What configuration is wrong ? > > Le ven. 3 juil. 2020 à 13:53, Matthias Runge a > écrit : > >> Okay, that doesn't really help with debugging though. >> >> Method not allowed is returned eg. when the endpoint expected an http >> push where your browser did an http get (that's correct). >> >> What I'd do next is to configure ceilometer to send to a different http >> endpoint (like a webserver on your workstation, just for debugging >> purposes). >> >> Verify that the push gateway works as expected, >> https://github.com/prometheus/pushgateway >> has some curl commands mentioned for debugging purposes. >> >> >> Matthias >> >> On 03/07/2020 13:07, Samuel Mutel wrote: >> > If I go to http://10.60.4.11:9091/metrics/job/ceilometer with the web >> > browser I receive: Method Not Allowed but i think it's normal. >> > http://10.60.4.11:9091/metrics is working with metrics. >> > >> > The pushgateway and the ceilometer is working on the same host for my >> > test so no network/firewall issue. >> > >> > Logs of the pushgateway is only these ones: >> > level=info ts=2020-07-03T11:04:35.907Z caller=main.go:83 msg="starting >> > pushgateway" version="(version=1.2.0, branch=HEAD, >> > revision=b7e0167e9574f4f88404dde9653ee1d3c940f2eb)" >> > level=info ts=2020-07-03T11:04:35.908Z caller=main.go:84 >> > build_context="(go=go1.13.8, user=root at 0e823ccfff84, >> > date=20200311-18:51:01)" >> > level=info ts=2020-07-03T11:04:35.911Z caller=main.go:137 >> > listen_address=:9091 >> > >> > Le ven. 3 juil. 2020 à 12:14, Matthias Runge > > > a écrit : >> > >> > On 03/07/2020 11:25, Samuel Mutel wrote: >> > > Hello, >> > > >> > > I have two questions about ceilometer (openstack version rocky). >> > > >> > > * First of all, it seems that ceilometer is sending metrics >> > every hour >> > > and I don't understand why. >> > > * Next, I am not able to setup ceilometer to send metrics to >> > > prometheus pushgateway. >> > > >> > > Here is my configuration: >> > > >> > > sources: >> > > - name: meter_file >> > > interval: 30 >> > > meters: >> > > - "*" >> > > sinks: >> > > - prometheus >> > > >> > > sinks: >> > > - name: prometheus >> > > publishers: >> > > - >> > prometheus://10.60.4.11:9091/metrics/job/ceilometer >> > >> > > >> > > >> > > >> > > Here is the error I received: >> > > >> > > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 >> > > # TYPE memory gauge >> > > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} >> 2048 >> > > # TYPE disk.ephemeral.size gauge >> > > >> > >> disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} >> > > 0 >> > > # TYPE disk.root.size gauge >> > > >> > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} >> 0 >> > > : HTTPError: 400 Client Error: Bad Request for url: >> > > http://10.60.4.11:9091/metrics/job/ceilometer >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> > > Traceback (most recent call last): >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> > File >> > > >> "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", >> > > line 178, in _do_post >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> >> > > res.raise_for_status() >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> > File >> > > "/usr/lib/python2.7/dist-packages/requests/models.py", line >> > 935, in >> > > raise_for_status >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> >> > > raise HTTPError(http_error_msg, response=self) >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> > > HTTPError: 400 Client Error: Bad Request for url: >> > > http://10.60.4.11:9091/metrics/job/ceilometer >> > > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http >> > > >> > > >> > > Thanks for your help on this topic. >> > >> > >> > Hi, >> > >> > first obvious question: >> > >> > are you sure that there is something listening under >> > http://10.60.4.11:9091/metrics/job/ceilometer ? >> > >> > Would you have some error logs from the other side? It seems that >> > ceilometer is trying to dispatch as expected. >> > >> > Matthias >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Aug 14 15:42:03 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 14 Aug 2020 10:42:03 -0500 Subject: [release] Release countdown for week R-8 Aug 17 - 21 Message-ID: <20200814154203.GA4129932@sm-workstation> General Information ------------------- We are getting close to some of the end of cycle deadlines. Please be aware of the upcoming non-client library freeze on September 3. The following cycle-with-intermediary deliverables only did one release during the ussuri cycle, and have not done any intermediary release yet during this cycle. The cycle-with-rc release model is more suited for deliverables that plan to be released only once per cycle. As a result, we have suggested [1] as a potential release model change for the following deliverables: adjutant-ui adjutant cloudkitty heat-agents magnum-ui monasca-thresh monasca-ui [1] https://review.opendev.org/#/q/topic:victoria-cwi PTLs and release liaisons for each of those deliverables can either +1 the release model change, or propose an intermediary release for that deliverable. In absence of answer by the end of R-8 week we'll abandon the patch. We also published a couple of options for a proposed release schedule for the upcoming Wallaby cycle. Please check out the separate thread: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016391.html 26 week schedule: https://review.opendev.org/745911/ 29 week schedule: https://review.opendev.org/744729/ Upcoming Deadlines & Dates -------------------------- Non-client library freeze: September 3 (R-6 week) Client library freeze: September 10 (R-5 week) Victoria-3 milestone: September 10 (R-5 week) Victoria release: October 14 From dev.faz at gmail.com Fri Aug 14 16:45:56 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 14 Aug 2020 18:45:56 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: Hi, i read somewhere that vexxhosts kubernetes openstack-Operator is running one rabbitmq Container per Service. Just the kubernetes self healing is used as "ha" for rabbitmq. That seems to match with my finding: run rabbitmq standalone and use an external system to restart rabbitmq if required. Fabian Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > Fabian, > > what do you mean? > > >> I think vexxhost is running (1) with their openstack-operator - for > reasons. > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > wrote: > > > > Hello again, > > > > just a short update about the results of my tests. > > > > I currently see 2 ways of running openstack+rabbitmq > > > > 1. without durable-queues and without replication - just one > rabbitmq-process which gets (somehow) restarted if it fails. > > 2. durable-queues and replication > > > > Any other combination of these settings leads to more or less issues with > > > > * broken / non working bindings > > * broken queues > > > > I think vexxhost is running (1) with their openstack-operator - for > reasons. > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > replication but without durable-queues. > > > > May someone point me to the best way to document these findings to some > official doc? > > I think a lot of installations out there will run into issues if - under > load - a node fails. > > > > Fabian > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > dev.faz at gmail.com>: > >> > >> Hi, > >> > >> just did some short tests today in our test-environment (without > durable queues and without replication): > >> > >> * started a rally task to generate some load > >> * kill-9-ed rabbitmq on one node > >> * rally task immediately stopped and the cloud (mostly) stopped working > >> > >> after some debugging i found (again) exchanges which had bindings to > queues, but these bindings didnt forward any msgs. > >> Wrote a small script to detect these broken bindings and will now check > if this is "reproducible" > >> > >> then I will try "durable queues" and "durable queues with replication" > to see if this helps. Even if I would expect > >> rabbitmq should be able to handle this without these "hidden broken > bindings" > >> > >> This just FYI. > >> > >> Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Aug 14 19:09:22 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 14 Aug 2020 20:09:22 +0100 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > Hi, > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > one rabbitmq Container per Service. Just the kubernetes self healing is > used as "ha" for rabbitmq. > > That seems to match with my finding: run rabbitmq standalone and use an > external system to restart rabbitmq if required. thats the design that was orginally planned for kolla-kubernetes orrignally each service was to be deployed with its own rabbit mq server if it required one and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster and if you trust k8s or the external service enough to ensure it is recteated it should be as effective a solution. you dont even need k8s to do that but it seams to be a good fit if your prepared to ocationally loose inflight rpcs. if you not then you can configure rabbit to persite all message to disk and mont that on a shared file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > Fabian > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > Fabian, > > > > what do you mean? > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > reasons. > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > wrote: > > > > > > Hello again, > > > > > > just a short update about the results of my tests. > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > 1. without durable-queues and without replication - just one > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > 2. durable-queues and replication > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > * broken / non working bindings > > > * broken queues > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > reasons. > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > replication but without durable-queues. > > > > > > May someone point me to the best way to document these findings to some > > > > official doc? > > > I think a lot of installations out there will run into issues if - under > > > > load - a node fails. > > > > > > Fabian > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > dev.faz at gmail.com>: > > > > > > > > Hi, > > > > > > > > just did some short tests today in our test-environment (without > > > > durable queues and without replication): > > > > > > > > * started a rally task to generate some load > > > > * kill-9-ed rabbitmq on one node > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > queues, but these bindings didnt forward any msgs. > > > > Wrote a small script to detect these broken bindings and will now check > > > > if this is "reproducible" > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > to see if this helps. Even if I would expect > > > > rabbitmq should be able to handle this without these "hidden broken > > > > bindings" > > > > > > > > This just FYI. > > > > > > > > Fabian From pramchan at yahoo.com Sat Aug 15 02:04:30 2020 From: pramchan at yahoo.com (prakash RAMCHANDRAN) Date: Sat, 15 Aug 2020 02:04:30 +0000 (UTC) Subject: [Interop-WG] Inviting cross-projects discussions for new re-branding efforts (Oct 26) References: <2035126983.2350763.1597457070531.ref@mail.yahoo.com> Message-ID: <2035126983.2350763.1597457070531@mail.yahoo.com> Hi all, I have booked for two hours slot to enable re-branding efforts we are looking to unleash in early 2021.As part of "Open Infrastructure Summit" we kick-start with Inter-op for next decade. Monday October 26 13UTC - 15UTC InteropWG We would like to encourage Open Infrastructure Projects to enlighten the stage with Out-of-Box  thinking & requests for Interop in Marketplace in OSF. - Integrated Projects in OpenStack have well served thru last decades dream team,  that has stood the Tempest tests for RefStackV1 being base for OPNFV-CNTT / ONAP/ and CVP/OVP1 of LFN - Its the turn to the Open Infra Projects like Kata, Airship, Zuul, StarlingX and potential https://openinfralabs.org/ to innovate and suggest the world    How OSF can leverage next-gen  Infra with k8s cluster as baseline for Milt-cluster , Hybrid Cloud, Muti-cloud RefStackV2 for upstream usage for Telco and Edge Clouds We need all Graduated and Incumbent Projects to propose how we can Re-Brand them for Open Infra Containerized workloads. Do you want to use Magnum, Zun, Kolla & Kuryer - refer https://etherpad.opendev.org/p/interop Should we collaborate with LFN re-imagining efforts via our RefStack2  plans as base for Open Infra Summit efforts to give Industry a wake up call to collaborate? Please reply with comments below, where are the global innovators hiding behind Alps &  Himalaya, come and swing your ping pongs balls or Cricket Bats. The Rocky mountains curve balls will always haunt you if you don't speak-up./*================================================================================================================Your comments +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/ Lets join the collaboration for unfinished transition to Containers world with Interoperability. Our committee members are all aligned to back you on our journey and ensure we bring the ideas that matter and execution that lits fire.https://www.openstack.org/summit/2020/vote-for-presentations#/24735 ThanksPrakash RamchandranFor Interop WG / OSF -------------- next part -------------- An HTML attachment was scrubbed... URL: From reza.b2008 at gmail.com Sat Aug 15 13:08:42 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Sat, 15 Aug 2020 17:38:42 +0430 Subject: VM doesn't have internet - OpenStack Ussuri with OVN networking Message-ID: Hi all, I've set up OpenStack Ussuri with OVN networking manually, VMs can ping each other through an internal network. I've created a provider network with valid IP subnet, and my problem is VMs don't have internet access before and after assigning floating IP. I've encountered the same problem on TripleO (with dvr), and I just wanted to investigate the problem by manual installation (without HA and DVR), but the same happened. Everything seems working properly, I can't see any error in logs, here is agent list output: [root at controller ~]# openstack network agent list +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ | 1ade76ae-6caf-4942-8df3-e3bc39d2f12d | OVN Controller Gateway agent | controller.localdomain | n/a | :-) | UP | ovn-controller | | 484f123f-5935-44ce-aee7-4102271d9f11 | OVN Controller agent | compute.localdomain | n/a | :-) | UP | ovn-controller | | 01235c13-4f32-4c4f-8cf6-e4b8d59a438a | OVN Metadata agent | compute.localdomain | n/a | :-) | UP | networking-ovn-metadata-agent | +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ On the controller I got br-ex with a valid IP address. here is the external-ids table on controller and compute node: [root at controller ~]# ovs-vsctl get Open_vSwitch . external-ids {hostname=controller.localdomain, ovn-bridge=br-int, ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.0.0.11", ovn-encap-type=geneve, ovn-remote="tcp:10.0.0.11:6642", rundir="/var/run/openvswitch", system-id="1ade76ae-6caf-4942-8df3-e3bc39d2f12d"} [root at compute ~]# ovs-vsctl get Open_vSwitch . external-ids {hostname=compute.localdomain, ovn-bridge=br-int, ovn-encap-ip="10.0.0.31", ovn-encap-type=geneve, ovn-remote="tcp:10.0.0.11:6642", rundir="/var/run/openvswitch", system-id="484f123f-5935-44ce-aee7-4102271d9f11"} and I have: [root at controller ~]# ovn-nbctl show switch 72fd5c08-6852-4d7e-b9b4-7e0a1ccdd976 (neutron-b8c66c3d-f47a-42a5-bd2d-c40c435c0376) (aka net01) port cf99f43b-0a18-4b91-9ca5-b6ed3f86d994 type: localport addresses: ["fa:16:3e:d0:df:82 192.168.0.100"] port 4268f511-bee3-4da0-8835-b9a8664101c4 addresses: ["fa:16:3e:35:f2:02 192.168.0.135"] port 846919e8-cde5-4ba3-b003-0c06e73676ed type: router router-port: lrp-846919e8-cde5-4ba3-b003-0c06e73676ed switch bb22224e-e1d1-4bb2-b57e-1058e9fc33a7 (neutron-9614546f-b216-4554-9bfe-e8d6bb11d927) (aka provider) port 2f05c7bc-ad0f-4a41-bbd8-5fef1f5bfd2c type: localport addresses: ["fa:16:3e:17:7b:5b X.X.X.X"] port provnet-9614546f-b216-4554-9bfe-e8d6bb11d927 type: localnet addresses: ["unknown"] port 23fcdc9d-2d11-40c9-881e-c78e871a3314 type: router router-port: lrp-23fcdc9d-2d11-40c9-881e-c78e871a3314 router 0bd35585-b0a3-4c8f-b71b-cb87c9fad060 (neutron-8cdcd0d2-752c-4130-87bb-d2b7af803ec9) (aka router01) port lrp-846919e8-cde5-4ba3-b003-0c06e73676ed mac: "fa:16:3e:4d:c3:f9" networks: ["192.168.0.1/24"] port lrp-23fcdc9d-2d11-40c9-881e-c78e871a3314 mac: "fa:16:3e:94:89:8e" networks: ["X.X.X.X/22"] gateway chassis: [1ade76ae-6caf-4942-8df3-e3bc39d2f12d 484f123f-5935-44ce-aee7-4102271d9f11] nat 8ef6167a-bc28-4caf-8af5-d0bf12a62545 external ip: " X.X.X.X " logical ip: "192.168.0.135" type: "dnat_and_snat" nat ba32ab93-3d2b-4199-b634-802f0f438338 external ip: " X.X.X.X " logical ip: "192.168.0.0/24" type: "snat" I replaced valid IPs with X.X.X.X Any suggestion would be grateful. Regards, Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: From midhunlaln66 at gmail.com Sat Aug 15 13:28:41 2020 From: midhunlaln66 at gmail.com (Midhunlal Nb) Date: Sat, 15 Aug 2020 18:58:41 +0530 Subject: Trouble to launch a instance in open stack Message-ID: Hi all, --> I created an openstack set up in my networking lab. ---> Vmware installed in one of the blade server then ubuntu 18.04 installed as OS. ---> openstack 5.3.1 version successfully installed in this os ---> In my lab we are using 192.168.x.x/16 network ---> In openstack I created an external network with 192.168.x.x/16 ----> In openstack I created an internal network with 172.16.x.x/16(for testing) -----> then I created 1 external network(provider network),1 router,1 private cloud . ------> In internal network I created 2 instance (172.16.0.2&172.16.0.3)this two instance pinging each other and i am able assign floating ip(192.168.x.x) to this instance ----> Now my problem is I created a original instance with my network(192.168.x.x)in provider network(that instance directly attached to external (provider network)) --->This instance launched successfully and interface ip also our internal ip and dns also showing correct but i am not able to ping our any one of the lab network ip,internet also not available. please help me on this Thanks & Regards Midhunlal N B +918921245637 From noonedeadpunk at ya.ru Sat Aug 15 14:47:06 2020 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Sat, 15 Aug 2020 17:47:06 +0300 Subject: [openstack-ansible] New core on board! Message-ID: <60371597502548@mail.yandex.ru> Hey everyone! Today we have several reasons to celebrate! First of all, we have released OSA Ussuri with tag 21.0.0 (better late than never :p). And even more exciting announcement, that Andrew Bonney is our new OpenStack-Ansible Core reviewer! Even though usual proposal process has been skipped this time, I think everyone will agree that Andrew deserved it and it's high time we congratulated him with becoming part of our team. Welcome on board, Andrew! -- Kind Regards, Dmitriy Rabotyagov From gsteinmuller at vexxhost.com Sat Aug 15 17:16:49 2020 From: gsteinmuller at vexxhost.com (=?UTF-8?Q?Guilherme_Steinm=C3=BCller?=) Date: Sat, 15 Aug 2020 14:16:49 -0300 Subject: [openstack-ansible] New core on board! In-Reply-To: <60371597502548@mail.yandex.ru> References: <60371597502548@mail.yandex.ru> Message-ID: +1 Welcome, Andrew! Regards, Guilherme On Sat, Aug 15, 2020 at 11:52 AM Dmitriy Rabotyagov wrote: > Hey everyone! > > Today we have several reasons to celebrate! > > First of all, we have released OSA Ussuri with tag 21.0.0 (better late > than never :p). > > And even more exciting announcement, that Andrew Bonney is our new > OpenStack-Ansible Core reviewer! Even though usual proposal process has > been skipped this time, I think everyone will agree that Andrew deserved it > and it's high time we congratulated him with becoming part of our team. > > Welcome on board, Andrew! > > -- > Kind Regards, > Dmitriy Rabotyagov > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Sat Aug 15 17:36:35 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Sat, 15 Aug 2020 19:36:35 +0200 Subject: [neutron] How to specify overlay network interface when using OVN and Geneve? In-Reply-To: References: Message-ID: <20200815173635.3z66wzg475d4kzm2@skaplons-mac> Hi, You can do that by configuring bridge_mappings on compute node(s). It is described in the doc [1]. On Fri, Aug 14, 2020 at 09:13:23PM +0700, Popoi Zen wrote: > Hi, I have used my google fu but I cant find any reference. Just want to > know how to specify overlay network when Im using geneve as my overlay > protocol? [1] https://docs.openstack.org/neutron/latest/admin/ovn/refarch/provider-networks.html -- Slawek Kaplonski Principal software engineer Red Hat From smooney at redhat.com Sat Aug 15 18:02:33 2020 From: smooney at redhat.com (Sean Mooney) Date: Sat, 15 Aug 2020 19:02:33 +0100 Subject: [neutron] How to specify overlay network interface when using OVN and Geneve? In-Reply-To: <20200815173635.3z66wzg475d4kzm2@skaplons-mac> References: <20200815173635.3z66wzg475d4kzm2@skaplons-mac> Message-ID: <79cf1f2a19cc242d0030e7ba3c39311aa176e6bf.camel@redhat.com> On Sat, 2020-08-15 at 19:36 +0200, Slawek Kaplonski wrote: > Hi, > > You can do that by configuring bridge_mappings on compute node(s). > It is described in the doc [1]. when they said overlay network i think they meant the geneve tunnels in which casue you contole the interface that is used by adjusting your routing table to use the interface you desire. that can involve movein ipt to bridges or interface to correctly set up the routes depending on your configurtion. but ya if you were refering to provider networks the link slawek porovide is proably what you want. > > On Fri, Aug 14, 2020 at 09:13:23PM +0700, Popoi Zen wrote: > > Hi, I have used my google fu but I cant find any reference. Just want to > > know how to specify overlay network when Im using geneve as my overlay > > protocol? > > [1] https://docs.openstack.org/neutron/latest/admin/ovn/refarch/provider-networks.html > From satish.txt at gmail.com Sat Aug 15 18:10:54 2020 From: satish.txt at gmail.com (Satish Patel) Date: Sat, 15 Aug 2020 14:10:54 -0400 Subject: [openstack-ansible] New core on board! In-Reply-To: References: Message-ID: Congrats Andrew Sent from my iPhone > On Aug 15, 2020, at 1:25 PM, Guilherme Steinmüller wrote: > >  > +1 > > Welcome, Andrew! > > Regards, > Guilherme > >> On Sat, Aug 15, 2020 at 11:52 AM Dmitriy Rabotyagov wrote: >> Hey everyone! >> >> Today we have several reasons to celebrate! >> >> First of all, we have released OSA Ussuri with tag 21.0.0 (better late than never :p). >> >> And even more exciting announcement, that Andrew Bonney is our new OpenStack-Ansible Core reviewer! Even though usual proposal process has been skipped this time, I think everyone will agree that Andrew deserved it and it's high time we congratulated him with becoming part of our team. >> >> Welcome on board, Andrew! >> >> -- >> Kind Regards, >> Dmitriy Rabotyagov >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Sat Aug 15 20:22:42 2020 From: zigo at debian.org (Thomas Goirand) Date: Sat, 15 Aug 2020 22:22:42 +0200 Subject: [Interop-WG] Inviting cross-projects discussions for new re-branding efforts (Oct 26) In-Reply-To: <2035126983.2350763.1597457070531@mail.yahoo.com> References: <2035126983.2350763.1597457070531.ref@mail.yahoo.com> <2035126983.2350763.1597457070531@mail.yahoo.com> Message-ID: On 8/15/20 4:04 AM, prakash RAMCHANDRAN wrote: > Hi all, > > I have booked for two hours slot to enable re-branding efforts we are > looking to unleash in early 2021. > As part of "Open Infrastructure Summit" we kick-start with Inter-op for > next decade. > > Monday October 2613UTC - 15UTCInteropWG > > We would like to encourage Open Infrastructure Projects to enlighten the > stage with Out-of-Box  thinking & requests for Interop in Marketplace in > OSF. > > - Integrated Projects in OpenStack have well served thru last decades > dream team,  that has stood the Tempest tests for RefStackV1 being base > for OPNFV-CNTT / ONAP/ and CVP/OVP1 of LFN > > - Its the turn to the Open Infra Projects like Kata, Airship, Zuul, > StarlingX and potential https://openinfralabs.org/ to innovate and > suggest the world >     How OSF can leverage next-gen  Infra with k8s cluster as baseline > for Milt-cluster , Hybrid Cloud, Muti-cloud RefStackV2 for upstream > usage for Telco and Edge Clouds > > We need all Graduated and Incumbent Projects to propose how we can > Re-Brand them for Open Infra Containerized workloads. > > Do you want to use Magnum, Zun, Kolla & Kuryer - > refer https://etherpad.opendev.org/p/interop > > Should we collaborate with LFN re-imagining efforts via our RefStack2  > plans as base for Open Infra Summit efforts to give Industry a wake up > call to collaborate? > > Please reply with comments below, where are the global innovators hiding > behind Alps &  Himalaya, come and swing your ping pongs balls or Cricket > Bats. The Rocky mountains curve balls will always haunt you if you don't > speak-up. I'm sorry if this is an abrupt response to your enthusiastic email, but I'd very much prefer if we made efforts to fill the gap with missing features, and getting understaffed projects on good rails, rather than pushing for more buzz words. I have in mind: - a networking stack that really scales, with IPv6 not as second citizen (ie: that must use centralized network nodes) - stuff like server recue working fully, even with boot from volume - finish the encrypted volume thingy (it's a joke: live-migration with them don't work because of rights issues on Barbican...) - finish the project specific client to openstack client migration (it's taking years....) Also, re-staffing projects like horizon, cloudkitty, telemetry, you-name-it... seems like another challenge of the next decade. Cheers, Thomas Goirand (zigo) From satish.txt at gmail.com Sun Aug 16 00:13:44 2020 From: satish.txt at gmail.com (Satish Patel) Date: Sat, 15 Aug 2020 20:13:44 -0400 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: Hi Sean, Sounds good, but running rabbitmq for each service going to be little overhead also, how do you scale cluster (Yes we can use cellv2 but its not something everyone like to do because of complexity). If we thinks rabbitMQ is growing pain then why community not looking for alternative option (kafka) etc..? On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > Hi, > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > one rabbitmq Container per Service. Just the kubernetes self healing is > > used as "ha" for rabbitmq. > > > > That seems to match with my finding: run rabbitmq standalone and use an > > external system to restart rabbitmq if required. > thats the design that was orginally planned for kolla-kubernetes orrignally > > each service was to be deployed with its own rabbit mq server if it required one > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > and if you trust k8s or the external service enough to ensure it is recteated it > should be as effective a solution. you dont even need k8s to do that but it seams to be > a good fit if your prepared to ocationally loose inflight rpcs. > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > Fabian > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > Fabian, > > > > > > what do you mean? > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > reasons. > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > wrote: > > > > > > > > Hello again, > > > > > > > > just a short update about the results of my tests. > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > 2. durable-queues and replication > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > * broken / non working bindings > > > > * broken queues > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > reasons. > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > replication but without durable-queues. > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > official doc? > > > > I think a lot of installations out there will run into issues if - under > > > > > > load - a node fails. > > > > > > > > Fabian > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > dev.faz at gmail.com>: > > > > > > > > > > Hi, > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > durable queues and without replication): > > > > > > > > > > * started a rally task to generate some load > > > > > * kill-9-ed rabbitmq on one node > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > queues, but these bindings didnt forward any msgs. > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > if this is "reproducible" > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > to see if this helps. Even if I would expect > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > bindings" > > > > > > > > > > This just FYI. > > > > > > > > > > Fabian > From alterriu at gmail.com Sun Aug 16 02:58:57 2020 From: alterriu at gmail.com (Popoi Zen) Date: Sun, 16 Aug 2020 09:58:57 +0700 Subject: [neutron] How to specify overlay network interface when using OVN and Geneve? In-Reply-To: <79cf1f2a19cc242d0030e7ba3c39311aa176e6bf.camel@redhat.com> References: <20200815173635.3z66wzg475d4kzm2@skaplons-mac> <79cf1f2a19cc242d0030e7ba3c39311aa176e6bf.camel@redhat.com> Message-ID: Yeah, what I mean is tunnel network between instance when instance communicate using selfservice network, can I specify from which host interface/NIC that traffic goes through? I found this: `ovs-vsctl set open . external-ids:ovn-encap-ip=IP_ADDRESS` is it righ? And btw, what is the best practise when using OVN? Did I need setup bridge for overlay interface and provider interface on my controller too? Since, as my understanding, inbound/outbound will have direct access from compute node by default on OVN. And in this guide [1] bridge only configured on compute nodes. [1] https://docs.openstack.org/neutron/ussuri/install/ovn/manual_install.html On Sun, Aug 16, 2020 at 1:02 AM Sean Mooney wrote: > On Sat, 2020-08-15 at 19:36 +0200, Slawek Kaplonski wrote: > > Hi, > > > > You can do that by configuring bridge_mappings on compute node(s). > > It is described in the doc [1]. > when they said overlay network i think they meant the geneve tunnels in > which > casue you contole the interface that is used by adjusting your routing > table to use the interface you desire. > that can involve movein ipt to bridges or interface to correctly set up > the routes depending on your > configurtion. > > but ya if you were refering to provider networks the link slawek porovide > is proably what you want. > > > > On Fri, Aug 14, 2020 at 09:13:23PM +0700, Popoi Zen wrote: > > > Hi, I have used my google fu but I cant find any reference. Just want > to > > > know how to specify overlay network when Im using geneve as my overlay > > > protocol? > > > > [1] > https://docs.openstack.org/neutron/latest/admin/ovn/refarch/provider-networks.html > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alterriu at gmail.com Sun Aug 16 03:08:08 2020 From: alterriu at gmail.com (Popoi Zen) Date: Sun, 16 Aug 2020 10:08:08 +0700 Subject: [neutron][ovn][sfc] Is it possible to use SFC (Service Function Chaining) on provider network? Message-ID: I have look some guide about SFC, but it seems that SFC only used on private/selfservice network. Is it possible to steer traffic between instance when they use provider network? I always getting error when using provider network. Maybe, can I push flow rule direct on OVN database or something like that? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Sun Aug 16 05:40:55 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Sun, 16 Aug 2020 07:40:55 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: Hi, Already looked in Oslo.messaging, but rabbitmq is the only stable driver :( Kafka is marked as experimental and (if the docs are correct) is only usable for notifications. Would love to switch to an alternate. Fabian Satish Patel schrieb am So., 16. Aug. 2020, 02:13: > Hi Sean, > > Sounds good, but running rabbitmq for each service going to be little > overhead also, how do you scale cluster (Yes we can use cellv2 but its > not something everyone like to do because of complexity). If we thinks > rabbitMQ is growing pain then why community not looking for > alternative option (kafka) etc..? > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > Hi, > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is > running > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > used as "ha" for rabbitmq. > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > external system to restart rabbitmq if required. > > thats the design that was orginally planned for kolla-kubernetes > orrignally > > > > each service was to be deployed with its own rabbit mq server if it > required one > > and if it crashed it woudl just be recreated by k8s. it perfromace > better then a cluster > > and if you trust k8s or the external service enough to ensure it is > recteated it > > should be as effective a solution. you dont even need k8s to do that but > it seams to be > > a good fit if your prepared to ocationally loose inflight rpcs. > > if you not then you can configure rabbit to persite all message to disk > and mont that on a shared > > file system like nfs or cephfs so that when the rabbit instance is > recreated the queue contency is > > perserved. assuming you can take the perfromance hit of writing all > messages to disk that is. > > > > > > Fabian > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, > 16:59: > > > > > > > Fabian, > > > > > > > > what do you mean? > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - > for > > > > > > > > reasons. > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > > wrote: > > > > > > > > > > Hello again, > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > 2. durable-queues and replication > > > > > > > > > > Any other combination of these settings leads to more or less > issues with > > > > > > > > > > * broken / non working bindings > > > > > * broken queues > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > reasons. > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > replication but without durable-queues. > > > > > > > > > > May someone point me to the best way to document these findings to > some > > > > > > > > official doc? > > > > > I think a lot of installations out there will run into issues if - > under > > > > > > > > load - a node fails. > > > > > > > > > > Fabian > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > Hi, > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > durable queues and without replication): > > > > > > > > > > > > * started a rally task to generate some load > > > > > > * kill-9-ed rabbitmq on one node > > > > > > * rally task immediately stopped and the cloud (mostly) stopped > working > > > > > > > > > > > > after some debugging i found (again) exchanges which had > bindings to > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > Wrote a small script to detect these broken bindings and will > now check > > > > > > > > if this is "reproducible" > > > > > > > > > > > > then I will try "durable queues" and "durable queues with > replication" > > > > > > > > to see if this helps. Even if I would expect > > > > > > rabbitmq should be able to handle this without these "hidden > broken > > > > > > > > bindings" > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > Fabian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Sun Aug 16 08:48:13 2020 From: tobias.urdin at binero.com (Tobias Urdin) Date: Sun, 16 Aug 2020 08:48:13 +0000 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> , Message-ID: <303EB7E0-C584-42A7-BF7A-D1EAABDD1AD7@binero.com> Hello, Kind of off topic but I’ve been starting doing some research to see if a KubeMQ driver could be added to oslo.messaging Best regards On 16 Aug 2020, at 07:44, Fabian Zimmermann wrote:  Hi, Already looked in Oslo.messaging, but rabbitmq is the only stable driver :( Kafka is marked as experimental and (if the docs are correct) is only usable for notifications. Would love to switch to an alternate. Fabian Satish Patel > schrieb am So., 16. Aug. 2020, 02:13: Hi Sean, Sounds good, but running rabbitmq for each service going to be little overhead also, how do you scale cluster (Yes we can use cellv2 but its not something everyone like to do because of complexity). If we thinks rabbitMQ is growing pain then why community not looking for alternative option (kafka) etc..? On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney > wrote: > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > Hi, > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > one rabbitmq Container per Service. Just the kubernetes self healing is > > used as "ha" for rabbitmq. > > > > That seems to match with my finding: run rabbitmq standalone and use an > > external system to restart rabbitmq if required. > thats the design that was orginally planned for kolla-kubernetes orrignally > > each service was to be deployed with its own rabbit mq server if it required one > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > and if you trust k8s or the external service enough to ensure it is recteated it > should be as effective a solution. you dont even need k8s to do that but it seams to be > a good fit if your prepared to ocationally loose inflight rpcs. > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > Fabian > > > > Satish Patel > schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > Fabian, > > > > > > what do you mean? > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > reasons. > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > wrote: > > > > > > > > Hello again, > > > > > > > > just a short update about the results of my tests. > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > 2. durable-queues and replication > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > * broken / non working bindings > > > > * broken queues > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > reasons. > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > replication but without durable-queues. > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > official doc? > > > > I think a lot of installations out there will run into issues if - under > > > > > > load - a node fails. > > > > > > > > Fabian > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > dev.faz at gmail.com>: > > > > > > > > > > Hi, > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > durable queues and without replication): > > > > > > > > > > * started a rally task to generate some load > > > > > * kill-9-ed rabbitmq on one node > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > queues, but these bindings didnt forward any msgs. > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > if this is "reproducible" > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > to see if this helps. Even if I would expect > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > bindings" > > > > > > > > > > This just FYI. > > > > > > > > > > Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Sun Aug 16 13:37:18 2020 From: smooney at redhat.com (Sean Mooney) Date: Sun, 16 Aug 2020 14:37:18 +0100 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote: > Hi Sean, > > Sounds good, but running rabbitmq for each service going to be little > overhead also, how do you scale cluster (Yes we can use cellv2 but its > not something everyone like to do because of complexity). my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of the queue need to be syconised across the cluster. so if cinder nova and neutron share a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton each having there on rabbitmq service then the independent deployment will tend to out perform the clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled over the years but in the past clustering was the adversary of scaling. > If we thinks > rabbitMQ is growing pain then why community not looking for > alternative option (kafka) etc..? we have looked at alternivives several times rabbit mq wroks well enough ans scales well enough for most deployments. there other amqp implimantation that scale better then rabbit, activemq and qpid are both reported to scale better but they perfrom worse out of the box and need to be carfully tuned in the past zeromq has been supported but peole did not maintain it. kafka i dont think is a good alternative but nats https://nats.io/ might be. for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky and its really not that complex. cells_v1 was much more complex bug part of the redesign for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another cell db and conductor to be deployed assuming you startted with a super conductor in the first place. the issue is cells is only a nova feature no other service have cells so it does not help you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first. adopign cells in other services is not nessaryally the right approch either but when we talk about scale we do need to keep in mind that cells is just for nova today. > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > Hi, > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > used as "ha" for rabbitmq. > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > external system to restart rabbitmq if required. > > > > thats the design that was orginally planned for kolla-kubernetes orrignally > > > > each service was to be deployed with its own rabbit mq server if it required one > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > > and if you trust k8s or the external service enough to ensure it is recteated it > > should be as effective a solution. you dont even need k8s to do that but it seams to be > > a good fit if your prepared to ocationally loose inflight rpcs. > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > > > Fabian > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > > > Fabian, > > > > > > > > what do you mean? > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > reasons. > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > wrote: > > > > > > > > > > Hello again, > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > 2. durable-queues and replication > > > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > > > * broken / non working bindings > > > > > * broken queues > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > reasons. > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > replication but without durable-queues. > > > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > > > official doc? > > > > > I think a lot of installations out there will run into issues if - under > > > > > > > > load - a node fails. > > > > > > > > > > Fabian > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > Hi, > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > durable queues and without replication): > > > > > > > > > > > > * started a rally task to generate some load > > > > > > * kill-9-ed rabbitmq on one node > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > > > if this is "reproducible" > > > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > > > to see if this helps. Even if I would expect > > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > > > bindings" > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > Fabian > > From tonyliu0592 at hotmail.com Sun Aug 16 18:41:20 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Sun, 16 Aug 2020 18:41:20 +0000 Subject: [neutron] How to specify overlay network interface when using OVN and Geneve? In-Reply-To: References: <20200815173635.3z66wzg475d4kzm2@skaplons-mac> <79cf1f2a19cc242d0030e7ba3c39311aa176e6bf.camel@redhat.com> Message-ID: I am using Kolla Ansible that deploys OVN well for me. You can set tunnel_interface to specify the tunnel interface. Tony > -----Original Message----- > From: Popoi Zen > Sent: Saturday, August 15, 2020 7:59 PM > To: Sean Mooney > Cc: Slawek Kaplonski ; openstack- > discuss at lists.openstack.org > Subject: Re: [neutron] How to specify overlay network interface when > using OVN and Geneve? > > Yeah, what I mean is tunnel network between instance when instance > communicate using selfservice network, can I specify from which host > interface/NIC that traffic goes through? I found this: `ovs-vsctl set > open . external-ids:ovn-encap-ip=IP_ADDRESS` is it righ? > > And btw, what is the best practise when using OVN? Did I need setup > bridge for overlay interface and provider interface on my controller too? > Since, as my understanding, inbound/outbound will have direct access > from compute node by default on OVN. And in this guide [1] bridge only > configured on compute nodes. > > [1] > https://docs.openstack.org/neutron/ussuri/install/ovn/manual_install.htm > l > > On Sun, Aug 16, 2020 at 1:02 AM Sean Mooney > wrote: > > > On Sat, 2020-08-15 at 19:36 +0200, Slawek Kaplonski wrote: > > Hi, > > > > You can do that by configuring bridge_mappings on compute node(s). > > It is described in the doc [1]. > when they said overlay network i think they meant the geneve > tunnels in which > casue you contole the interface that is used by adjusting your > routing table to use the interface you desire. > that can involve movein ipt to bridges or interface to correctly > set up the routes depending on your > configurtion. > > but ya if you were refering to provider networks the link slawek > porovide is proably what you want. > > > > On Fri, Aug 14, 2020 at 09:13:23PM +0700, Popoi Zen wrote: > > > Hi, I have used my google fu but I cant find any reference. > Just want to > > > know how to specify overlay network when Im using geneve as my > overlay > > > protocol? > > > > [1] > https://docs.openstack.org/neutron/latest/admin/ovn/refarch/provider- > networks.html > > > > From adriant at catalystcloud.nz Mon Aug 17 04:42:32 2020 From: adriant at catalystcloud.nz (Adrian Turjak) Date: Mon, 17 Aug 2020 16:42:32 +1200 Subject: [requirements][oslo] Inclusion of CONFspirator in openstack/requirements Message-ID: Hey OpenStackers! I'm hoping to add CONFspirator to openstack/requirements as I'm using it Adjutant: https://review.opendev.org/#/c/746436/ The library has been in Adjutant for a while but I didn't add it to openstack/requirements, so I'm trying to remedy that now. I think it is different enough from oslo.config and I think the features/differences are ones that are unlikely to ever make sense in oslo.config without breaking it for people who do use it as it is, or adding too much complexity. I wanted to use oslo.config but quickly found that the way I was currently doing config in Adjutant was heavily dependent on yaml, and the ability to nest things. I was in a bind because I didn't have a declarative config system like oslo.config, and the config for Adjutant was a mess to maintain and understand (even for me, and I wrote it) with random parts of the code pulling config that may or may not have been set/declared. After finding oslo.config was not suitable for my rather weird needs, I took oslo.config as a starting point and ended up writing another library specific to my requirements in Adjutant, and rather than keeping it internal to Adjutant, moved it to an external library. CONFspirator was built for a weird and complex edge case, because I have plugins that need to dynamically load config on startup, which then has to be lazy_loaded. I also have weird overlay logic for defaults that can be overridden, and building it into the library made Adjutant simpler. I also have nested config groups that need to be named dynamically to allow plugin classes to be extended without subclasses sharing the same config group name. I built something specific to my needs, that just so happens to also be a potentially useful library for people wanting something like oslo.config but that is targeted towards yaml and toml, and the ability to nest groups. The docs are here: https://confspirator.readthedocs.io/ The code is here: https://gitlab.com/catalyst-cloud/confspirator And for those interested in how I use it in Adjutant here are some places of interest (be warned, it may be a rabbit hole): https://opendev.org/openstack/adjutant/src/branch/master/adjutant/config https://opendev.org/openstack/adjutant/src/branch/master/adjutant/feature_set.py https://opendev.org/openstack/adjutant/src/branch/master/adjutant/core.py https://opendev.org/openstack/adjutant/src/branch/master/adjutant/api/v1/openstack.py#L35-L44 https://opendev.org/openstack/adjutant/src/branch/master/adjutant/actions/v1/projects.py#L155-L164 https://opendev.org/openstack/adjutant/src/branch/master/adjutant/actions/v1/base.py#L146 https://opendev.org/openstack/adjutant/src/branch/master/adjutant/tasks/v1/base.py#L30 https://opendev.org/openstack/adjutant/src/branch/master/adjutant/tasks/v1/base.py#L293 If there are strong opinions about working to add this to oslo.config, let's chat, as I'm not against merging this into it somehow if we find a way that make sense, but while some aspects where similar, I felt that this was cleaner without being part of oslo.config because the mindset I was building towards seemed different and oslo.config didn't need my complexity. Cheers, Adrian -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Mon Aug 17 07:46:32 2020 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Mon, 17 Aug 2020 09:46:32 +0200 Subject: [kuryr] vPTG October 2020 Message-ID: <54f84af6378e1507d1f04c0aab733922cdc2c8bd.camel@redhat.com> Hello all, There's a vPTG October 2020 project signup process going on and I'd like to ask if you want me to reserve an hour or two there for a sync up on the priorities and plans of various parts of the team. Thanks, Michał From akekane at redhat.com Mon Aug 17 07:56:36 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 17 Aug 2020 13:26:36 +0530 Subject: [glance] Virtual PTG October 2020 Message-ID: Hi Team, There is a project signup process going on for virtual PTG October 2020. I will like to book slots for the same between 1400 UTC to 1700 UTC. Please let me know your convenience for the same. Thanks & Best Regards, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Mon Aug 17 08:48:23 2020 From: stephenfin at redhat.com (Stephen Finucane) Date: Mon, 17 Aug 2020 09:48:23 +0100 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core In-Reply-To: References: Message-ID: <5df90046486505f18d1fb812a12e26d1c68cf311.camel@redhat.com> On Thu, 2020-08-13 at 17:06 +0200, Moises Guimaraes de Medeiros wrote: > Hello everybody, > > > > It is my pleasure to propose Lance Bragstad (lbragstad) as a new > member of the oslo.core core team. > Lance has been a big contributor to the project and is known as a > walking version of the Keystone documentation, which happens to be > one of the biggest consumers of oslo.cache. > > > > Obviously we think he'd make a good addition to the core team. If > there are no objections, I'll make that happen in a week. > > > > Thanks. > > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Mon Aug 17 09:37:01 2020 From: hberaud at redhat.com (Herve Beraud) Date: Mon, 17 Aug 2020 11:37:01 +0200 Subject: [oslo] Proposing Lance Bragstad as oslo.cache core In-Reply-To: <5df90046486505f18d1fb812a12e26d1c68cf311.camel@redhat.com> References: <5df90046486505f18d1fb812a12e26d1c68cf311.camel@redhat.com> Message-ID: +1 Le lun. 17 août 2020 à 10:52, Stephen Finucane a écrit : > On Thu, 2020-08-13 at 17:06 +0200, Moises Guimaraes de Medeiros wrote: > > Hello everybody, > > It is my pleasure to propose Lance Bragstad (lbragstad) as a new member > of the oslo.core core team. > > Lance has been a big contributor to the project and is known as a walking > version of the Keystone documentation, which happens to be one of the > biggest consumers of oslo.cache. > > Obviously we think he'd make a good addition to the core team. If there > are no objections, I'll make that happen in a week. > > Thanks. > > > +1 > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Aug 17 12:01:55 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 17 Aug 2020 08:01:55 -0400 Subject: [neutron][ops] API for viewing HA router states Message-ID: Hi all, Over the past few days, we were troubleshooting an issue that ended up having a root cause where keepalived has somehow ended up active in two different L3 agents. We've yet to find the root cause of how this happened but removing it and adding it resolved the issue for us. As we work on improving our monitoring, we wanted to implement something that gets us the info of # of active routers to check if there's a router that has >1 active L3 agent but it's hard because hitting the /l3-agents endpoint on _every_ single router hurts a lot on performance. Is there something else that we can watch which might be more productive? FYI -- this all goes in the open and will end up inside the openstack-exporter: https://github.com/openstack-exporter/openstack-exporter and the Helm charts will end up with the alerts: https://github.com/openstack-exporter/helm-charts Thanks! Mohammed -- Mohammed Naser VEXXHOST, Inc. From bence.romsics at gmail.com Mon Aug 17 12:18:13 2020 From: bence.romsics at gmail.com (Bence Romsics) Date: Mon, 17 Aug 2020 14:18:13 +0200 Subject: [neutron] bug deputy report for week of 2020-08-10 Message-ID: Hi, This is last week's buglist. Probably due to summer vacations but we have a few bugs without an owner. High: * https://bugs.launchpad.net/neutron/+bug/1891307 SSH fails in neutron-ovn-tripleo-ci-centos-8-containers-multinode job gate-failure, unassigned * https://bugs.launchpad.net/neutron/+bug/1891309 Designate integration - internal server error in Neutron gate-failure, unassigned * https://bugs.launchpad.net/neutron/+bug/1891517 neutron.tests.unit.common.test_utils.TimerTestCase.test__enter_with_timeout fails once in a while gate-failure, proposed fix: https://review.opendev.org/746154 * https://bugs.launchpad.net/neutron/+bug/1891673 qrouter ns ip rules not deleted when fip removed from vm proposed fix: https://review.opendev.org/746336 Needs further triage by someone knowing the designate integration better than I do: * https://bugs.launchpad.net/neutron/+bug/1891333 strange behavior of dns_domain with designate multi domain * https://bugs.launchpad.net/neutron/+bug/1891512 neutron designate DNS dns_domain assignment issue Low: * https://bugs.launchpad.net/neutron/+bug/1891243 neutron tempest failure: neutron_tempest_plugin.api.test_extensions.ExtensionsTest.test_list_extensions_includes_all OVN sample devstack conf did not enable all service plugins needed for tempest tests proposed fix: https://review.opendev.org/745829 Whishlist: * https://bugs.launchpad.net/neutron/+bug/1891360 Floating IP agent gateway IP addresses not released when deleting dead DVR L3 agents unassigned * https://bugs.launchpad.net/neutron/+bug/1891448 L3 agent mode transition between dvr and dvr_no_external unassigned RFE: * https://bugs.launchpad.net/neutron/+bug/1891334 [RFE] Enable change of CIDR on a subnet Best regards, Bence (rubasov) From dev.faz at gmail.com Mon Aug 17 13:54:31 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 17 Aug 2020 15:54:31 +0200 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: Hi, I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? Fabian Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser < mnaser at vexxhost.com>: > Hi all, > > Over the past few days, we were troubleshooting an issue that ended up > having a root cause where keepalived has somehow ended up active in > two different L3 agents. We've yet to find the root cause of how this > happened but removing it and adding it resolved the issue for us. > > As we work on improving our monitoring, we wanted to implement > something that gets us the info of # of active routers to check if > there's a router that has >1 active L3 agent but it's hard because > hitting the /l3-agents endpoint on _every_ single router hurts a lot > on performance. > > Is there something else that we can watch which might be more > productive? FYI -- this all goes in the open and will end up inside > the openstack-exporter: > https://github.com/openstack-exporter/openstack-exporter and the Helm > charts will end up with the alerts: > https://github.com/openstack-exporter/helm-charts > > Thanks! > Mohammed > > -- > Mohammed Naser > VEXXHOST, Inc. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Mon Aug 17 13:57:27 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Mon, 17 Aug 2020 09:57:27 -0400 (EDT) Subject: [all][goals] Switch legacy Zuul jobs to native - update #2 In-Reply-To: <1991766177.46483554.1597672564973.JavaMail.zimbra@redhat.com> Message-ID: <54384483.46483679.1597672647276.JavaMail.zimbra@redhat.com> Hi, Much progress has happened since the first report, almost 4 weeks ago. Let's summarize the main documents: - the goal: https://governance.openstack.org/tc/goals/selected/victoria/native-zuulv3-jobs.html - the document above now includes the reference to the up-to-date Zuul v3 porting guide: https://docs.openstack.org/project-team-guide/zuulv3.html - the etherpad which tracks the current status: https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration - the previous report: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016058.html If you still have legacy jobs around, please prioritize this porting work. Victoria branching is not far away (actually, quite close for client libraries). This is the list of projects which still need to complete the work. Most of the legacy jobs derive from legacy-dsvm-base, and those are the most important ones. There is a limited amount of jobs which derive from legacy-base that should be taken into account as well, though. - barbican (*) - blazar (*) - cinder (/) (-> yes, this is my fault for the record :) - designate (+) - ec2-api (*) - freezer (*) - heat (/) - infra (/) - ironic (*) - karbor (*) - magnum (*) - manila (/) - but only devstack-base - monasca (+) - murano (/) - neutron (*) - nova (*) - oslo (*) - senlin - trove (+) - vitrage - zaqar The symbol close to the project name provides more detals about the status: (+) means that the project cores are at least aware of the issue, (*) means that there was active pending reviews for some of the remaining jobs, (/) means that there was past activity but no open reviews currently I'd just like to remind everyone that, while not part of the main goal, backporting the new jobs to the older branches when possible will make future maintenance easier. Ciao -- Luigi From dev.faz at gmail.com Mon Aug 17 14:03:39 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 17 Aug 2020 16:03:39 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: Just to keep the list updated. If you run with durable_queues and replication, there is still a possibility, that a short living queue will *not* jet be replicated and a node failure will mark these queue as "unreachable". This wouldnt be a problem, if openstack would create a new queue, but i fear it would just try to reuse the existing after reconnect. So, after all - it seems the less buggy way would be * use durable-queue and replication for long-running queues/exchanges * use non-durable-queue without replication for short (fanout, reply_) queues This should allow the short-living ones to destroy themself on node failure, and the long living ones should be able to be as available as possible. Absolutely untested - so use with caution, but here is a possible policy-regex: ^(?!amq\.)(?!reply_)(?!.*fanout).* Fabian Am So., 16. Aug. 2020 um 15:37 Uhr schrieb Sean Mooney : > > On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote: > > Hi Sean, > > > > Sounds good, but running rabbitmq for each service going to be little > > overhead also, how do you scale cluster (Yes we can use cellv2 but its > > not something everyone like to do because of complexity). > > my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers > througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of > the queue need to be syconised across the cluster. so if cinder nova and neutron share > a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton > each having there on rabbitmq service then the independent deployment will tend to out perform the > clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled > over the years but in the past clustering was the adversary of scaling. > > > If we thinks > > rabbitMQ is growing pain then why community not looking for > > alternative option (kafka) etc..? > we have looked at alternivives several times > rabbit mq wroks well enough ans scales well enough for most deployments. > there other amqp implimantation that scale better then rabbit, > activemq and qpid are both reported to scale better but they perfrom worse > out of the box and need to be carfully tuned > > in the past zeromq has been supported but peole did not maintain it. > > kafka i dont think is a good alternative but nats https://nats.io/ might be. > > for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky > and its really not that complex. cells_v1 was much more complex bug part of the redesign > for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another > cell db and conductor to be deployed assuming you startted with a super conductor in the first > place. the issue is cells is only a nova feature no other service have cells so it does not help > you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first. > adopign cells in other services is not nessaryally the right approch either but when we talk about scale > we do need to keep in mind that cells is just for nova today. > > > > > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > > Hi, > > > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > > used as "ha" for rabbitmq. > > > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > > external system to restart rabbitmq if required. > > > > > > thats the design that was orginally planned for kolla-kubernetes orrignally > > > > > > each service was to be deployed with its own rabbit mq server if it required one > > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > > > and if you trust k8s or the external service enough to ensure it is recteated it > > > should be as effective a solution. you dont even need k8s to do that but it seams to be > > > a good fit if your prepared to ocationally loose inflight rpcs. > > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > > > > > Fabian > > > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > > > > > Fabian, > > > > > > > > > > what do you mean? > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > reasons. > > > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > > wrote: > > > > > > > > > > > > Hello again, > > > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > > 2. durable-queues and replication > > > > > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > > > > > * broken / non working bindings > > > > > > * broken queues > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > reasons. > > > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > > > replication but without durable-queues. > > > > > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > > > > > official doc? > > > > > > I think a lot of installations out there will run into issues if - under > > > > > > > > > > load - a node fails. > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > > > durable queues and without replication): > > > > > > > > > > > > > > * started a rally task to generate some load > > > > > > > * kill-9-ed rabbitmq on one node > > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > > > > > if this is "reproducible" > > > > > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > > > > > to see if this helps. Even if I would expect > > > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > > > > > bindings" > > > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > > > Fabian > > > > > From amuller at redhat.com Mon Aug 17 14:03:25 2020 From: amuller at redhat.com (Assaf Muller) Date: Mon, 17 Aug 2020 10:03:25 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: On Mon, Aug 17, 2020 at 9:59 AM Fabian Zimmermann wrote: > > Hi, > > I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. > > Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? There's already an API for this: neutron l3-agent-list-hosting-router It will show you the HA state per L3 agent for the given router. > > Fabian > > > Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser : >> >> Hi all, >> >> Over the past few days, we were troubleshooting an issue that ended up >> having a root cause where keepalived has somehow ended up active in >> two different L3 agents. We've yet to find the root cause of how this >> happened but removing it and adding it resolved the issue for us. >> >> As we work on improving our monitoring, we wanted to implement >> something that gets us the info of # of active routers to check if >> there's a router that has >1 active L3 agent but it's hard because >> hitting the /l3-agents endpoint on _every_ single router hurts a lot >> on performance. >> >> Is there something else that we can watch which might be more >> productive? FYI -- this all goes in the open and will end up inside >> the openstack-exporter: >> https://github.com/openstack-exporter/openstack-exporter and the Helm >> charts will end up with the alerts: >> https://github.com/openstack-exporter/helm-charts >> >> Thanks! >> Mohammed >> >> -- >> Mohammed Naser >> VEXXHOST, Inc. >> From dev.faz at gmail.com Mon Aug 17 14:05:07 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 17 Aug 2020 16:05:07 +0200 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: Hi, yes for 1 router, but doing this in a loop for hundreds is not so performant ;) Fabian Am Mo., 17. Aug. 2020 um 16:04 Uhr schrieb Assaf Muller : > > On Mon, Aug 17, 2020 at 9:59 AM Fabian Zimmermann wrote: > > > > Hi, > > > > I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. > > > > Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? > > There's already an API for this: > neutron l3-agent-list-hosting-router > > It will show you the HA state per L3 agent for the given router. > > > > > Fabian > > > > > > Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser : > >> > >> Hi all, > >> > >> Over the past few days, we were troubleshooting an issue that ended up > >> having a root cause where keepalived has somehow ended up active in > >> two different L3 agents. We've yet to find the root cause of how this > >> happened but removing it and adding it resolved the issue for us. > >> > >> As we work on improving our monitoring, we wanted to implement > >> something that gets us the info of # of active routers to check if > >> there's a router that has >1 active L3 agent but it's hard because > >> hitting the /l3-agents endpoint on _every_ single router hurts a lot > >> on performance. > >> > >> Is there something else that we can watch which might be more > >> productive? FYI -- this all goes in the open and will end up inside > >> the openstack-exporter: > >> https://github.com/openstack-exporter/openstack-exporter and the Helm > >> charts will end up with the alerts: > >> https://github.com/openstack-exporter/helm-charts > >> > >> Thanks! > >> Mohammed > >> > >> -- > >> Mohammed Naser > >> VEXXHOST, Inc. > >> > From yan.y.zhao at intel.com Mon Aug 17 01:52:43 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Mon, 17 Aug 2020 09:52:43 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> Message-ID: <20200817015243.GE15344@joy-OptiPlex-7040> On Fri, Aug 14, 2020 at 01:30:00PM +0100, Sean Mooney wrote: > On Fri, 2020-08-14 at 13:16 +0800, Yan Zhao wrote: > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > driver is it handled by? > > > > > > > > It looks that the devlink is for network device specific, and in > > > > devlink.h, it says > > > > include/uapi/linux/devlink.h - Network physical device Netlink > > > > interface, > > > > > > > > > Actually not, I think there used to have some discussion last year and the > > > conclusion is to remove this comment. > > > > > > It supports IB and probably vDPA in the future. > > > > > > > hmm... sorry, I didn't find the referred discussion. only below discussion > > regarding to why to add devlink. > > > > https://www.mail-archive.com/netdev at vger.kernel.org/msg95801.html > > >This doesn't seem to be too much related to networking? Why can't something > > >like this be in sysfs? > > > > It is related to networking quite bit. There has been couple of > > iteration of this, including sysfs and configfs implementations. There > > has been a consensus reached that this should be done by netlink. I > > believe netlink is really the best for this purpose. Sysfs is not a good > > idea > > > > https://www.mail-archive.com/netdev at vger.kernel.org/msg96102.html > > >there is already a way to change eth/ib via > > >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1 > > > > > >sounds like this is another way to achieve the same? > > > > It is. However the current way is driver-specific, not correct. > > For mlx5, we need the same, it cannot be done in this way. Do devlink is > > the correct way to go. > im not sure i agree with that. > standardising a filesystem based api that is used across all vendors is also a valid > option. that said if devlink is the right choice form a kerenl perspective by all > means use it but i have not heard a convincing argument for why it actually better. > with tthat said we have been uing tools like ethtool to manage aspect of nics for decades > so its not that strange an idea to use a tool and binary protocoal rather then a text > based interface for this but there are advantages to both approches. > > Yes, I agree with you. > > https://lwn.net/Articles/674867/ > > There a is need for some userspace API that would allow to expose things > > that are not directly related to any device class like net_device of > > ib_device, but rather chip-wide/switch-ASIC-wide stuff. > > > > Use cases: > > 1) get/set of port type (Ethernet/InfiniBand) > > 2) monitoring of hardware messages to and from chip > > 3) setting up port splitters - split port into multiple ones and squash again, > > enables usage of splitter cable > > 4) setting up shared buffers - shared among multiple ports within one chip > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > - [path to device] > > > > |--- migration > > | |--- self > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > | |--- compatible > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > > > > > > > > > > > > I feel like it's not very appropriate for a GPU driver to use > > > > this interface. Is that right? > > > > > > > > > I think not though most of the users are switch or ethernet devices. It > > > doesn't prevent you from inventing new abstractions. > > > > so need to patch devlink core and the userspace devlink tool? > > e.g. devlink migration > and devlink python libs if openstack was to use it directly. > we do have caes where we just frok a process and execaute a comannd in a shell > with or without elevated privladge but we really dont like doing that due to > the performacne impacat and security implciations so where we can use python bindign > over c apis we do. pyroute2 is the only python lib i know off of the top of my head > that support devlink so we would need to enhacne it to support this new devlink api. > there may be otherss i have not really looked in the past since we dont need to use > devlink at all today. > > > > > Note that devlink is based on netlink, netlink has been widely used by > > > various subsystems other than networking. > > > > the advantage of netlink I see is that it can monitor device status and > > notify upper layer that migration database needs to get updated. > > But not sure whether openstack would like to use this capability. > > As Sean said, it's heavy for openstack. it's heavy for vendor driver > > as well :) > > > > And devlink monitor now listens the notification and dumps the state > > changes. If we want to use it, need to let it forward the notification > > and dumped info to openstack, right? > i dont think we would use direct devlink monitoring in nova even if it was avaiable. > we could but we already poll libvirt and the system for other resouce periodicly. so, if we use file system based approach, could openstack periodically check and update the migration info? e.g. every minute, read /sys//migration/self/*, and if there are any file disappearing or appearing or content changes, just let the placement know. Then when about to start migration, check source device's /sys//migration/compatible/* and searches the placement if there are existing device matching to it, if yes, create vm with the device and migrate to it; if not, and if it's an mdev, try to create a matching one and migrate to it. (to create a matching mdev, I guess openstack can follow below sequence: 1. find a target device with the same device id (e.g. parent pci id) 2. create an mdev with matching mdev type 3. adjust other vendor specific attributes 4. if 2 or 3 fails, go to 1 again ) is this approach feasible? > we likely wouldl just add monitoriv via devlink to that periodic task. > we certenly would not use it to detect a migration or a need to update a migration database(not sure what that is) by migration database, I meant the traits in the placement. :) if a periodic monitoring or devlink is required, then periodically monitor sysfs is also viable, right? > > in reality if we can consume this info indirectly via a libvirt api that will > be the appcoh we will take at least for the libvirt driver in nova. for cyborg > they may take a different appoch. we already use pyroute2 in 2 projects, os-vif and > neutron and it does have devlink support so the burden of using devlink is not that > high for openstack but its a less frineadly interface for configuration tools like > ansiable vs a filesystem based approch. > > From cohuck at redhat.com Mon Aug 17 06:38:28 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Mon, 17 Aug 2020 08:38:28 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <315669b0-5c75-d359-a912-62ebab496abf@linux.ibm.com> References: <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> <20200807135942.5d56a202.cohuck@redhat.com> <20200813173347.239801fa.cohuck@redhat.com> <315669b0-5c75-d359-a912-62ebab496abf@linux.ibm.com> Message-ID: <20200817083828.187315ef.cohuck@redhat.com> On Thu, 13 Aug 2020 15:02:53 -0400 Eric Farman wrote: > On 8/13/20 11:33 AM, Cornelia Huck wrote: > > On Fri, 7 Aug 2020 13:59:42 +0200 > > Cornelia Huck wrote: > > > >> On Wed, 05 Aug 2020 12:35:01 +0100 > >> Sean Mooney wrote: > >> > >>> On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote: > >>>> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao at intel.com wrote: > >> > >> (...) > >> > >>>>> software_version: device driver's version. > >>>>> in .[.bugfix] scheme, where there is no > >>>>> compatibility across major versions, minor versions have > >>>>> forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and > >>>>> bugfix version number indicates some degree of internal > >>>>> improvement that is not visible to the user in terms of > >>>>> features or compatibility, > >>>>> > >>>>> vendor specific attributes: each vendor may define different attributes > >>>>> device id : device id of a physical devices or mdev's parent pci device. > >>>>> it could be equal to pci id for pci devices > >>>>> aggregator: used together with mdev_type. e.g. aggregator=2 together > >>>>> with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel > >>>>> graphics device. > >>>>> remote_url: for a local NVMe VF, it may be configured with a remote > >>>>> url of a remote storage and all data is stored in the > >>>>> remote side specified by the remote url. > >>>>> ... > >>> just a minor not that i find ^ much more simmple to understand then > >>> the current proposal with self and compatiable. > >>> if i have well defiend attibute that i can parse and understand that allow > >>> me to calulate the what is and is not compatible that is likely going to > >>> more useful as you wont have to keep maintianing a list of other compatible > >>> devices every time a new sku is released. > >>> > >>> in anycase thank for actully shareing ^ as it make it simpler to reson about what > >>> you have previously proposed. > >> > >> So, what would be the most helpful format? A 'software_version' field > >> that follows the conventions outlined above, and other (possibly > >> optional) fields that have to match? > > > > Just to get a different perspective, I've been trying to come up with > > what would be useful for a very different kind of device, namely > > vfio-ccw. (Adding Eric to cc: for that.) > > > > software_version makes sense for everybody, so it should be a standard > > attribute. > > > > For the vfio-ccw type, we have only one vendor driver (vfio-ccw_IO). > > > > Given a subchannel A, we want to make sure that subchannel B has a > > reasonable chance of being compatible. I guess that means: > > > > - same subchannel type (I/O) > > - same chpid type (e.g. all FICON; I assume there are no 'mixed' setups > > -- Eric?) > > Correct. > > > - same number of chpids? Maybe we can live without that and just inject > > some machine checks, I don't know. Same chpid numbers is something we > > cannot guarantee, especially if we want to migrate cross-CEC (to > > another machine.) > > I think we'd live without it, because I wouldn't expect it to be > consistent between systems. Yes, and the guest needs to be able to deal with changing path configurations anyway. > > > > > Other possibly interesting information is not available at the > > subchannel level (vfio-ccw is a subchannel driver.) > > I presume you're alluding to the DASD uid (dasdinfo -x) here? Yes, or the even more basic Sense ID information. > > > > > So, looking at a concrete subchannel on one of my machines, it would > > look something like the following: > > > > > > software_version=1.0.0 > > type=vfio-ccw <-- would be vfio-pci on the example above > > > > subchannel_type=0 > > > > chpid_type=0x1a > > chpid_mask=0xf0 <-- not sure if needed/wanted Let's just drop the chpid_mask here. > > > > Does that make sense? Would be interesting if someone could come up with some possible information for a third type of device. From jegor at greenedge.cloud Mon Aug 17 10:15:11 2020 From: jegor at greenedge.cloud (Jegor van Opdorp) Date: Mon, 17 Aug 2020 10:15:11 +0000 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: , Message-ID: We're also using masakari and willing to help maintain it! ________________________________ From: Mark Goddard Sent: Monday, August 17, 2020 12:12 PM To: Jegor van Opdorp Subject: Fwd: [tc][masakari] Project aliveness (was: [masakari] Meetings) ---------- Forwarded message --------- From: Radosław Piliszek Date: Fri, 14 Aug 2020 at 08:53 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) To: openstack-discuss Cc: Sampath Priyankara (samP) , Tushar Patil (tpatil) Hi, it's been a month since I wrote the original (quoted) email, so I retry it with CC to the PTL and a recently (this year) active core. I see there have been no meetings and neither Masakari IRC channel nor review queues have been getting much attention during that time period. I am, therefore, offering my help to maintain the project. Regarding the original topic, I would opt for running Masakari meetings during the time I proposed so that interested parties could join and I know there is at least some interest based on recent IRC activity (i.e. there exist people who want to use and discuss Masakari - apart from me that is :-) ). -yoctozepto On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek wrote: > > Hello Fellow cloud-HA-seekers, > > I wanted to attend Masakari meetings but I found the current schedule unfit. > Is there a chance to change the schedule? The day is fine but a shift > by +3 hours would be nice. > > Anyhow, I wanted to discuss [1]. I've already proposed a change > implementing it and looking forward to positive reviews. :-) That > said, please reply on the change directly, or mail me or catch me on > IRC, whichever option sounds best to you. > > [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbengt at redhat.com Mon Aug 17 12:08:05 2020 From: dbengt at redhat.com (Daniel Bengtsson) Date: Mon, 17 Aug 2020 14:08:05 +0200 Subject: Can't fetch from opendev. Message-ID: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> Hi everyone, I have tried to fetch the repository tripleo-heat-templates from opendev. I was not able to do that: http://paste.openstack.org/show/796882/ With github it works. I have asked to another colleague to try, he have the same problem. From arnaud.morin at gmail.com Mon Aug 17 14:17:37 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 17 Aug 2020 14:17:37 +0000 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: <20200817141737.GU31915@sync> Hey Fabian, I was thinking the same, and I found the "default" values from openstack-ansible: https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172 pattern: '^(?!(amq\.)|(.*_fanout_)|(reply_)).*' Which are setting HA for all except amq.* *_fanout_* reply_* So that would make sense? -- Arnaud Morin On 17.08.20 - 16:03, Fabian Zimmermann wrote: > Just to keep the list updated. > > If you run with durable_queues and replication, there is still a > possibility, that a short living queue will *not* jet be replicated > and a node failure will mark these queue as "unreachable". This > wouldnt be a problem, if openstack would create a new queue, but i > fear it would just try to reuse the existing after reconnect. > > So, after all - it seems the less buggy way would be > > * use durable-queue and replication for long-running queues/exchanges > * use non-durable-queue without replication for short (fanout, reply_) queues > > This should allow the short-living ones to destroy themself on node > failure, and the long living ones should be able to be as available as > possible. > > Absolutely untested - so use with caution, but here is a possible > policy-regex: ^(?!amq\.)(?!reply_)(?!.*fanout).* > > Fabian > > > Am So., 16. Aug. 2020 um 15:37 Uhr schrieb Sean Mooney : > > > > On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote: > > > Hi Sean, > > > > > > Sounds good, but running rabbitmq for each service going to be little > > > overhead also, how do you scale cluster (Yes we can use cellv2 but its > > > not something everyone like to do because of complexity). > > > > my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers > > througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of > > the queue need to be syconised across the cluster. so if cinder nova and neutron share > > a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton > > each having there on rabbitmq service then the independent deployment will tend to out perform the > > clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled > > over the years but in the past clustering was the adversary of scaling. > > > > > If we thinks > > > rabbitMQ is growing pain then why community not looking for > > > alternative option (kafka) etc..? > > we have looked at alternivives several times > > rabbit mq wroks well enough ans scales well enough for most deployments. > > there other amqp implimantation that scale better then rabbit, > > activemq and qpid are both reported to scale better but they perfrom worse > > out of the box and need to be carfully tuned > > > > in the past zeromq has been supported but peole did not maintain it. > > > > kafka i dont think is a good alternative but nats https://nats.io/ might be. > > > > for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky > > and its really not that complex. cells_v1 was much more complex bug part of the redesign > > for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another > > cell db and conductor to be deployed assuming you startted with a super conductor in the first > > place. the issue is cells is only a nova feature no other service have cells so it does not help > > you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first. > > adopign cells in other services is not nessaryally the right approch either but when we talk about scale > > we do need to keep in mind that cells is just for nova today. > > > > > > > > > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > > > Hi, > > > > > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > > > used as "ha" for rabbitmq. > > > > > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > > > external system to restart rabbitmq if required. > > > > > > > > thats the design that was orginally planned for kolla-kubernetes orrignally > > > > > > > > each service was to be deployed with its own rabbit mq server if it required one > > > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > > > > and if you trust k8s or the external service enough to ensure it is recteated it > > > > should be as effective a solution. you dont even need k8s to do that but it seams to be > > > > a good fit if your prepared to ocationally loose inflight rpcs. > > > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > > > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > > > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > > > > > > > Fabian > > > > > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > > > > > > > Fabian, > > > > > > > > > > > > what do you mean? > > > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > > > reasons. > > > > > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > > > wrote: > > > > > > > > > > > > > > Hello again, > > > > > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > > > 2. durable-queues and replication > > > > > > > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > > > > > > > * broken / non working bindings > > > > > > > * broken queues > > > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > > > reasons. > > > > > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > > > > > replication but without durable-queues. > > > > > > > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > > > > > > > official doc? > > > > > > > I think a lot of installations out there will run into issues if - under > > > > > > > > > > > > load - a node fails. > > > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > > > > > durable queues and without replication): > > > > > > > > > > > > > > > > * started a rally task to generate some load > > > > > > > > * kill-9-ed rabbitmq on one node > > > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > > > > > > > if this is "reproducible" > > > > > > > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > > > > > > > to see if this helps. Even if I would expect > > > > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > > > > > > > bindings" > > > > > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > > > > > Fabian > > > > > > > > > From mnaser at vexxhost.com Mon Aug 17 14:17:39 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 17 Aug 2020 10:17:39 -0400 Subject: Can't fetch from opendev. In-Reply-To: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> Message-ID: Hi there, I've reported this to the OpenDev team at #opendev on IRC, I think one of the Gitea backends is likely unhappy. Thanks Mohammed On Mon, Aug 17, 2020 at 10:15 AM Daniel Bengtsson wrote: > > Hi everyone, > > I have tried to fetch the repository tripleo-heat-templates from > opendev. I was not able to do that: > > http://paste.openstack.org/show/796882/ > > With github it works. I have asked to another colleague to try, he have > the same problem. > > -- Mohammed Naser VEXXHOST, Inc. From mnaser at vexxhost.com Mon Aug 17 14:18:51 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 17 Aug 2020 10:18:51 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: Hi all: What Fabian is describing is exactly the problem we're having, there are _many_ routers in these environments so we'd be looking at N requests which can get out of control quickly Thanks Mohammed On Mon, Aug 17, 2020 at 10:05 AM Fabian Zimmermann wrote: > > Hi, > > yes for 1 router, but doing this in a loop for hundreds is not so performant ;) > > Fabian > > Am Mo., 17. Aug. 2020 um 16:04 Uhr schrieb Assaf Muller : > > > > On Mon, Aug 17, 2020 at 9:59 AM Fabian Zimmermann wrote: > > > > > > Hi, > > > > > > I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. > > > > > > Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? > > > > There's already an API for this: > > neutron l3-agent-list-hosting-router > > > > It will show you the HA state per L3 agent for the given router. > > > > > > > > Fabian > > > > > > > > > Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser : > > >> > > >> Hi all, > > >> > > >> Over the past few days, we were troubleshooting an issue that ended up > > >> having a root cause where keepalived has somehow ended up active in > > >> two different L3 agents. We've yet to find the root cause of how this > > >> happened but removing it and adding it resolved the issue for us. > > >> > > >> As we work on improving our monitoring, we wanted to implement > > >> something that gets us the info of # of active routers to check if > > >> there's a router that has >1 active L3 agent but it's hard because > > >> hitting the /l3-agents endpoint on _every_ single router hurts a lot > > >> on performance. > > >> > > >> Is there something else that we can watch which might be more > > >> productive? FYI -- this all goes in the open and will end up inside > > >> the openstack-exporter: > > >> https://github.com/openstack-exporter/openstack-exporter and the Helm > > >> charts will end up with the alerts: > > >> https://github.com/openstack-exporter/helm-charts > > >> > > >> Thanks! > > >> Mohammed > > >> > > >> -- > > >> Mohammed Naser > > >> VEXXHOST, Inc. > > >> > > -- Mohammed Naser VEXXHOST, Inc. From dev.faz at gmail.com Mon Aug 17 14:21:34 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Mon, 17 Aug 2020 16:21:34 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <20200817141737.GU31915@sync> References: <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> <20200817141737.GU31915@sync> Message-ID: Hi, oh, that's great! So, someone at openstack-ansible already detected this and just forgot to update the docs.openstack.org ;) I tested my regex and it seems to fix my issue (atm). I will run an openstack rally load test with the regex above to check what happens if I terminate a rabbitmq while load is hitting the system. Fabian Am Mo., 17. Aug. 2020 um 16:17 Uhr schrieb Arnaud Morin : > > Hey Fabian, > > I was thinking the same, and I found the "default" values from > openstack-ansible: > https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/fc27e735a68b64cb3c67dd8abeaf324803a9845b/defaults/main.yml#L172 > > pattern: '^(?!(amq\.)|(.*_fanout_)|(reply_)).*' > > Which are setting HA for all except > amq.* > *_fanout_* > reply_* > > So that would make sense? > > -- > Arnaud Morin > > On 17.08.20 - 16:03, Fabian Zimmermann wrote: > > Just to keep the list updated. > > > > If you run with durable_queues and replication, there is still a > > possibility, that a short living queue will *not* jet be replicated > > and a node failure will mark these queue as "unreachable". This > > wouldnt be a problem, if openstack would create a new queue, but i > > fear it would just try to reuse the existing after reconnect. > > > > So, after all - it seems the less buggy way would be > > > > * use durable-queue and replication for long-running queues/exchanges > > * use non-durable-queue without replication for short (fanout, reply_) queues > > > > This should allow the short-living ones to destroy themself on node > > failure, and the long living ones should be able to be as available as > > possible. > > > > Absolutely untested - so use with caution, but here is a possible > > policy-regex: ^(?!amq\.)(?!reply_)(?!.*fanout).* > > > > Fabian > > > > > > Am So., 16. Aug. 2020 um 15:37 Uhr schrieb Sean Mooney : > > > > > > On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote: > > > > Hi Sean, > > > > > > > > Sounds good, but running rabbitmq for each service going to be little > > > > overhead also, how do you scale cluster (Yes we can use cellv2 but its > > > > not something everyone like to do because of complexity). > > > > > > my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers > > > througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of > > > the queue need to be syconised across the cluster. so if cinder nova and neutron share > > > a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton > > > each having there on rabbitmq service then the independent deployment will tend to out perform the > > > clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled > > > over the years but in the past clustering was the adversary of scaling. > > > > > > > If we thinks > > > > rabbitMQ is growing pain then why community not looking for > > > > alternative option (kafka) etc..? > > > we have looked at alternivives several times > > > rabbit mq wroks well enough ans scales well enough for most deployments. > > > there other amqp implimantation that scale better then rabbit, > > > activemq and qpid are both reported to scale better but they perfrom worse > > > out of the box and need to be carfully tuned > > > > > > in the past zeromq has been supported but peole did not maintain it. > > > > > > kafka i dont think is a good alternative but nats https://nats.io/ might be. > > > > > > for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky > > > and its really not that complex. cells_v1 was much more complex bug part of the redesign > > > for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another > > > cell db and conductor to be deployed assuming you startted with a super conductor in the first > > > place. the issue is cells is only a nova feature no other service have cells so it does not help > > > you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first. > > > adopign cells in other services is not nessaryally the right approch either but when we talk about scale > > > we do need to keep in mind that cells is just for nova today. > > > > > > > > > > > > > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > > > > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > > > > Hi, > > > > > > > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > > > > used as "ha" for rabbitmq. > > > > > > > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > > > > external system to restart rabbitmq if required. > > > > > > > > > > thats the design that was orginally planned for kolla-kubernetes orrignally > > > > > > > > > > each service was to be deployed with its own rabbit mq server if it required one > > > > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > > > > > and if you trust k8s or the external service enough to ensure it is recteated it > > > > > should be as effective a solution. you dont even need k8s to do that but it seams to be > > > > > a good fit if your prepared to ocationally loose inflight rpcs. > > > > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > > > > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > > > > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > > > > > > > > > Fabian > > > > > > > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > > > > > > > > > Fabian, > > > > > > > > > > > > > > what do you mean? > > > > > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > > > > > reasons. > > > > > > > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > > > > wrote: > > > > > > > > > > > > > > > > Hello again, > > > > > > > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > > > > 2. durable-queues and replication > > > > > > > > > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > > > > > > > > > * broken / non working bindings > > > > > > > > * broken queues > > > > > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > > > > > reasons. > > > > > > > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > > > > > > > replication but without durable-queues. > > > > > > > > > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > > > > > > > > > official doc? > > > > > > > > I think a lot of installations out there will run into issues if - under > > > > > > > > > > > > > > load - a node fails. > > > > > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > > > > > > > durable queues and without replication): > > > > > > > > > > > > > > > > > > * started a rally task to generate some load > > > > > > > > > * kill-9-ed rabbitmq on one node > > > > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > > > > > > > > > if this is "reproducible" > > > > > > > > > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > > > > > > > > > to see if this helps. Even if I would expect > > > > > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > > > > > > > > > bindings" > > > > > > > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > > > > > > > Fabian > > > > > > > > > > > > > From fungi at yuggoth.org Mon Aug 17 14:37:03 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 17 Aug 2020 14:37:03 +0000 Subject: Can't fetch from opendev. In-Reply-To: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> Message-ID: <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> [keeping Daniel in Cc as he doesn't appear to be subscribed] On 2020-08-17 14:08:05 +0200 (+0200), Daniel Bengtsson wrote: [...] > I have tried to fetch the repository tripleo-heat-templates from > opendev. I was not able to do that: > > http://paste.openstack.org/show/796882/ > > With github it works. I have asked to another colleague to try, he > have the same problem. What command(s) did you run and what error message is Git giving you? That paste doesn't look like an error, just a trace of the internal operations which were performed. Are you and your colleague both connecting from the same network? Possibly the same corporate network or the same VPN? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From amuller at redhat.com Mon Aug 17 15:39:44 2020 From: amuller at redhat.com (Assaf Muller) Date: Mon, 17 Aug 2020 11:39:44 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: On Mon, Aug 17, 2020 at 10:19 AM Mohammed Naser wrote: > > Hi all: > > What Fabian is describing is exactly the problem we're having, there > are _many_ routers in these environments so we'd be looking at N > requests which can get out of control quickly I think it's a clear use case to implement a new API endpoint that returns HA state per agent for *all* routers in a single call. Should be easy to implement. > > Thanks > Mohammed > > On Mon, Aug 17, 2020 at 10:05 AM Fabian Zimmermann wrote: > > > > Hi, > > > > yes for 1 router, but doing this in a loop for hundreds is not so performant ;) > > > > Fabian > > > > Am Mo., 17. Aug. 2020 um 16:04 Uhr schrieb Assaf Muller : > > > > > > On Mon, Aug 17, 2020 at 9:59 AM Fabian Zimmermann wrote: > > > > > > > > Hi, > > > > > > > > I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. > > > > > > > > Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? > > > > > > There's already an API for this: > > > neutron l3-agent-list-hosting-router > > > > > > It will show you the HA state per L3 agent for the given router. > > > > > > > > > > > Fabian > > > > > > > > > > > > Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser : > > > >> > > > >> Hi all, > > > >> > > > >> Over the past few days, we were troubleshooting an issue that ended up > > > >> having a root cause where keepalived has somehow ended up active in > > > >> two different L3 agents. We've yet to find the root cause of how this > > > >> happened but removing it and adding it resolved the issue for us. > > > >> > > > >> As we work on improving our monitoring, we wanted to implement > > > >> something that gets us the info of # of active routers to check if > > > >> there's a router that has >1 active L3 agent but it's hard because > > > >> hitting the /l3-agents endpoint on _every_ single router hurts a lot > > > >> on performance. > > > >> > > > >> Is there something else that we can watch which might be more > > > >> productive? FYI -- this all goes in the open and will end up inside > > > >> the openstack-exporter: > > > >> https://github.com/openstack-exporter/openstack-exporter and the Helm > > > >> charts will end up with the alerts: > > > >> https://github.com/openstack-exporter/helm-charts > > > >> > > > >> Thanks! > > > >> Mohammed > > > >> > > > >> -- > > > >> Mohammed Naser > > > >> VEXXHOST, Inc. > > > >> > > > > > > > -- > Mohammed Naser > VEXXHOST, Inc. > From corey.bryant at canonical.com Mon Aug 17 15:59:20 2020 From: corey.bryant at canonical.com (Corey Bryant) Date: Mon, 17 Aug 2020 11:59:20 -0400 Subject: cross-team action items Message-ID: These were the items from the cross team today that need action from our team: Bootstack: What’s the plan for Ceph-osd w/ openstack-on-lxd for Bluestore? Can we give an answer as to whether we plan to deprecate or solve this, one way or another? Bootstack: Gnocchi thread - can we respond to thread to make official decision so they can plan accordingly? ie. if upstream is not supported, we likely won't support, so if we can clarify then bootstack can remove from standard deploys. SEG: LP#1891096 - configuration database support for mimic+: Any value that exists in the configuration database will no longer receive updates from ceph.conf. Will be subscribed to field-high. Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Aug 17 16:13:15 2020 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 17 Aug 2020 11:13:15 -0500 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <303EB7E0-C584-42A7-BF7A-D1EAABDD1AD7@binero.com> References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> <303EB7E0-C584-42A7-BF7A-D1EAABDD1AD7@binero.com> Message-ID: <95b6b4d8-c70e-7f78-2659-4d12d315b42d@nemebean.com> On 8/16/20 3:48 AM, Tobias Urdin wrote: > Hello, > > Kind of off topic but I’ve been starting doing some research to see if a > KubeMQ driver could be added to oslo.messaging You may want to take a look at https://docs.openstack.org/oslo.messaging/latest/contributor/supported-messaging-drivers.html We've had bad luck with adding new drivers to oslo.messaging in the past, so we've tried to come up with a policy that gives them the best possible chance of being successful. It does set a rather high bar for integration though. Also take a look at https://review.opendev.org/#/c/692784/ A lot of the discussion there may be relevant to another new driver. > > Best regards > >> On 16 Aug 2020, at 07:44, Fabian Zimmermann wrote: >> >>  >> Hi, >> >> Already looked in Oslo.messaging, but rabbitmq is the only stable >> driver :( >> >> Kafka is marked as experimental and (if the docs are correct) is only >> usable for notifications. >> >> Would love to switch to an alternate. >> >>  Fabian >> >> Satish Patel > >> schrieb am So., 16. Aug. 2020, 02:13: >> >> Hi Sean, >> >> Sounds good, but running rabbitmq for each service going to be little >> overhead also, how do you scale cluster (Yes we can use cellv2 but its >> not something everyone like to do because of complexity). If we thinks >> rabbitMQ is growing pain then why community not looking for >> alternative option (kafka) etc..? >> >> On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney > > wrote: >> > >> > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: >> > > Hi, >> > > >> > > i read somewhere that vexxhosts kubernetes openstack-Operator >> is running >> > > one rabbitmq Container per Service. Just the kubernetes self >> healing is >> > > used as "ha" for rabbitmq. >> > > >> > > That seems to match with my finding: run rabbitmq standalone >> and use an >> > > external system to restart rabbitmq if required. >> > thats the design that was orginally planned for kolla-kubernetes >> orrignally >> > >> > each service was to be deployed with its own rabbit mq server if >> it required one >> > and if it crashed it woudl just be recreated by k8s. it >> perfromace better then a cluster >> > and if you trust k8s or the external service enough to ensure it >> is recteated it >> > should be as effective a solution. you dont even need k8s to do >> that but it seams to be >> > a good fit if  your prepared to ocationally loose inflight rpcs. >> > if you not then you can configure rabbit to persite all message >> to disk and mont that on a shared >> > file system like nfs or cephfs so that when the rabbit instance >> is recreated the queue contency is >> > perserved. assuming you can take the perfromance hit of writing >> all messages to disk that is. >> > > >> > >  Fabian >> > > >> > > Satish Patel > > schrieb am Fr., 14. Aug. 2020, 16:59: >> > > >> > > > Fabian, >> > > > >> > > > what do you mean? >> > > > >> > > > > > I think vexxhost is running (1) with their >> openstack-operator - for >> > > > >> > > > reasons. >> > > > >> > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann >> > >> > > > wrote: >> > > > > >> > > > > Hello again, >> > > > > >> > > > > just a short update about the results of my tests. >> > > > > >> > > > > I currently see 2 ways of running openstack+rabbitmq >> > > > > >> > > > > 1. without durable-queues and without replication - just one >> > > > >> > > > rabbitmq-process which gets (somehow) restarted if it fails. >> > > > > 2. durable-queues and replication >> > > > > >> > > > > Any other combination of these settings leads to more or >> less issues with >> > > > > >> > > > > * broken / non working bindings >> > > > > * broken queues >> > > > > >> > > > > I think vexxhost is running (1) with their >> openstack-operator - for >> > > > >> > > > reasons. >> > > > > >> > > > > I added [kolla], because kolla-ansible is installing >> rabbitmq with >> > > > >> > > > replication but without durable-queues. >> > > > > >> > > > > May someone point me to the best way to document these >> findings to some >> > > > >> > > > official doc? >> > > > > I think a lot of installations out there will run into >> issues if - under >> > > > >> > > > load - a node fails. >> > > > > >> > > > >  Fabian >> > > > > >> > > > > >> > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < >> > > > >> > > > dev.faz at gmail.com >: >> > > > > > >> > > > > > Hi, >> > > > > > >> > > > > > just did some short tests today in our test-environment >> (without >> > > > >> > > > durable queues and without replication): >> > > > > > >> > > > > > * started a rally task to generate some load >> > > > > > * kill-9-ed rabbitmq on one node >> > > > > > * rally task immediately stopped and the cloud (mostly) >> stopped working >> > > > > > >> > > > > > after some debugging i found (again) exchanges which had >> bindings to >> > > > >> > > > queues, but these bindings didnt forward any msgs. >> > > > > > Wrote a small script to detect these broken bindings and >> will now check >> > > > >> > > > if this is "reproducible" >> > > > > > >> > > > > > then I will try "durable queues" and "durable queues >> with replication" >> > > > >> > > > to see if this helps. Even if I would expect >> > > > > > rabbitmq should be able to handle this without these >> "hidden broken >> > > > >> > > > bindings" >> > > > > > >> > > > > > This just FYI. >> > > > > > >> > > > > >  Fabian >> > >> From elfosardo at gmail.com Mon Aug 17 16:29:56 2020 From: elfosardo at gmail.com (Riccardo Pittau) Date: Mon, 17 Aug 2020 18:29:56 +0200 Subject: [ironic] next Victoria meetup Message-ID: Hello everyone! The time for the next Ironic virtual meetup is close! It will be an opportunity to review what has been done in the last months, exchange ideas and plan for the time before the upcoming victoria release, with an eye towards the future. We're aiming to have the virtual meetup the first week of September (Monday August 31 - Friday September 4) and split it in two days, with one two-hours slot per day. Please vote for your best time slots here: https://doodle.com/poll/pi4x3kuxamf4nnpu We're planning to leave the vote open at least for the entire week until Friday August 21, so to have enough time to announce the final slots and planning early next week. Thanks! A si biri Riccardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Mon Aug 17 16:35:56 2020 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 17 Aug 2020 11:35:56 -0500 Subject: [oslo] Feature Freeze is approaching Message-ID: Hi Oslo contributors, Oslo observes a feature freeze that is earlier than other projects. This is to allow time for features in Oslo to be adopted in the services before their feature freeze. And it's coming up soon. Aug. 28th is the Oslo feature freeze date for this cycle. That leaves about two weeks for features to be merged in Oslo libraries. After that, any features to be merged will require a feature freeze exception, which can be requested on the list. If you have any questions about this feel free to contact me here or on IRC (bnemec in #openstack-oslo). Thanks! -Ben From melwittt at gmail.com Mon Aug 17 17:50:15 2020 From: melwittt at gmail.com (melanie witt) Date: Mon, 17 Aug 2020 10:50:15 -0700 Subject: [neutron][gate] verbose q-svc log files and e-r indexing Message-ID: Hi all, Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate. We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this: Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}} We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method. Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output: $ wc -l screen-q-svc.txt 186387 screen-q-svc.txt $ grep -v "function apply_funcs" screen-q-svc.txt|wc -l 102593 Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging? Thanks for your help. Cheers, -melanie [1] http://status.openstack.org/elastic-recheck [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcdn.com/744958/4/check/tempest-integrated-compute/4421bf9/controller/logs/screen-q-svc.txt [3] https://review.opendev.org/741540 From mnaser at vexxhost.com Mon Aug 17 18:36:34 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 17 Aug 2020 14:36:34 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Move towards single office hour https://review.opendev.org/745200 - Drop all exceptions for legacy validation https://review.opendev.org/745403 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Fix names inside check-review-status https://review.opendev.org/745913 - Resolution to define distributed leadership for projects https://review.opendev.org/744995 - Move towards dual office hours in diff TZ https://review.opendev.org/746167 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 ## Project Updates - Add python-dracclient to be owned by Hardware Vendor SIG https://review.opendev.org/745564 ## General Changes - Add legacy repository validation https://review.opendev.org/737559 - Clean up expired i18n SIG extra-ATCs https://review.opendev.org/745565 - Sort SIG names in repo owner list https://review.opendev.org/745563 - Drop neutron-vpnaas from legacy projects https://review.opendev.org/745401 - Pierre Riteau as CloudKitty PTL for Victoria https://review.opendev.org/745653 - Declare supported runtimes for Wallaby release https://review.opendev.org/743847 ## Abandoned Changes - Move towards dual office hours https://review.opendev.org/745201 # Email Threads - Zuul Native Jobs Goal Update #2: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016561.html - Masakari Project Aliveness: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016520.html - vPTG October 2020 Signup: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016497.html - OpenStack Client vs python-*clients: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016409.html # Other Reminders - Virtual Summit Community voting closes Monday, August 17 at 11:59pm Pacific Time Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From kennelson11 at gmail.com Mon Aug 17 20:46:32 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 17 Aug 2020 13:46:32 -0700 Subject: [all][PTL][TC] Forum Brainstorming Message-ID: Hello Everyone! The Virtual Forum is approaching. We would love 1 volunteer from the community for the Forum Selection Committee. Ideally, the volunteer would already be serving in some capacity in a governance role for your project. In addition to calling for volunteers for the Forum selection committee, this email kicks off the brainstorming period before the CFP tool opens for formal Forum submissions. The categories for brainstorming etherpads have already been setup here[1]. Please add your etherpads and ideas there! The CFP tool will open on August 31st and will close September 14th. For information on the upcoming virtual Summit[2]. For more information on the Forum[3]. Please reach out to jimmy at openstack.org or knelson at openstack.org if you're interested. Volunteers should respond on or before August 31, 2020. Thanks! Kendall (diablo_rojo) [1] Virtual Forum 2020 Wiki: https://wiki.openstack.org/wiki/Forum/Virtual2020 [2] Virtual Open Infra Summit Site: https://www.openstack.org/summit/2020 [3] General Forum Wiki: https://wiki.openstack.org/wiki/Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbengt at redhat.com Mon Aug 17 14:19:29 2020 From: dbengt at redhat.com (Daniel Bengtsson) Date: Mon, 17 Aug 2020 16:19:29 +0200 Subject: Can't fetch from opendev. In-Reply-To: References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> Message-ID: <90a2f1fd-b450-fcc0-ccf2-d5ed1e0b7533@redhat.com> On 8/17/20 4:17 PM, Mohammed Naser wrote: > I've reported this to the OpenDev team at #opendev on IRC, I think one > of the Gitea backends is likely unhappy. Thanks a lot for your answer and the report. From cjeanner at redhat.com Tue Aug 18 07:29:20 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Tue, 18 Aug 2020 09:29:20 +0200 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? Message-ID: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Hello there! I'm wondering if we could move the "tripleo-ansible-inventory" script from the tripleo-validations repo to tripleo-common. The main motivation here is to make things consistent: - that script calls content from tripleo-common, nothing from tripleo-validations. - that script isn't only for the validations, so it makes more sense to install it via tripleo-common - in fact, we should probably push that inventory thing as an `openstack tripleo' sub-command, but that's another story So, is there any opposition to this proposal? Cheers, C. -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From ramishra at redhat.com Tue Aug 18 07:53:40 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Tue, 18 Aug 2020 13:23:40 +0530 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? In-Reply-To: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> References: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Message-ID: On Tue, Aug 18, 2020 at 1:07 PM Cédric Jeanneret wrote: > Hello there! > > I'm wondering if we could move the "tripleo-ansible-inventory" script > from the tripleo-validations repo to tripleo-common. > TBH, I don't know the history, but it would be better if we remove all scripts from tripleo-common and use it just as a utility library (now that Mistral is gone). Most of the existing scripts probably have an existing command in tripleoclient. We can implement missing ones including "tripleo-ansible-inventory" in python-tripleoclient. > > The main motivation here is to make things consistent: > - that script calls content from tripleo-common, nothing from > tripleo-validations. > - that script isn't only for the validations, so it makes more sense to > install it via tripleo-common > - in fact, we should probably push that inventory thing as an `openstack > tripleo' sub-command, but that's another story > > So, is there any opposition to this proposal? > > Cheers, > > C. > > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjeanner at redhat.com Tue Aug 18 08:03:02 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Tue, 18 Aug 2020 10:03:02 +0200 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? In-Reply-To: References: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Message-ID: On 8/18/20 9:53 AM, Rabi Mishra wrote: > > > On Tue, Aug 18, 2020 at 1:07 PM Cédric Jeanneret > wrote: > > Hello there! > > I'm wondering if we could move the "tripleo-ansible-inventory" script > from the tripleo-validations repo to tripleo-common. > > > TBH, I don't know the history, but it would be better if we remove all > scripts from tripleo-common and use it just as a utility library (now > that Mistral is gone). Most of the existing scripts probably have an > existing command in tripleoclient. We can implement  missing ones > including "tripleo-ansible-inventory" in python-tripleoclient. would probably be better to implement it directly in tripleoclient imho. In any cases, it has nothing to do in tripleo-validations... I can't connect to launchpad, they are having some auth issue, I can't create an RFE there :(. > > > The main motivation here is to make things consistent: > - that script calls content from tripleo-common, nothing from > tripleo-validations. > - that script isn't only for the validations, so it makes more sense to > install it via tripleo-common > - in fact, we should probably push that inventory thing as an `openstack > tripleo' sub-command, but that's another story > > So, is there any opposition to this proposal? > > Cheers, > > C. > > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > > > -- > Regards, > Rabi Mishra > -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From jaosorior at redhat.com Tue Aug 18 08:20:47 2020 From: jaosorior at redhat.com (Juan Osorio Robles) Date: Tue, 18 Aug 2020 11:20:47 +0300 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? In-Reply-To: References: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Message-ID: IIRC it was in tripleo-validations since that was the first and only user of the script at the time. When it got into use by other flows it just never got moved. On Tue, 18 Aug 2020 at 11:11, Cédric Jeanneret wrote: > > > On 8/18/20 9:53 AM, Rabi Mishra wrote: > > > > > > On Tue, Aug 18, 2020 at 1:07 PM Cédric Jeanneret > > wrote: > > > > Hello there! > > > > I'm wondering if we could move the "tripleo-ansible-inventory" script > > from the tripleo-validations repo to tripleo-common. > > > > > > TBH, I don't know the history, but it would be better if we remove all > > scripts from tripleo-common and use it just as a utility library (now > > that Mistral is gone). Most of the existing scripts probably have an > > existing command in tripleoclient. We can implement missing ones > > including "tripleo-ansible-inventory" in python-tripleoclient. > > would probably be better to implement it directly in tripleoclient imho. > In any cases, it has nothing to do in tripleo-validations... > Moving it to tripleoclient makes sense IMO. > I can't connect to launchpad, they are having some auth issue, I can't > create an RFE there :(. > > > > > > > The main motivation here is to make things consistent: > > - that script calls content from tripleo-common, nothing from > > tripleo-validations. > > - that script isn't only for the validations, so it makes more sense > to > > install it via tripleo-common > > - in fact, we should probably push that inventory thing as an > `openstack > > tripleo' sub-command, but that's another story > > > > So, is there any opposition to this proposal? > > > > Cheers, > > > > C. > > > > > > -- > > Cédric Jeanneret (He/Him/His) > > Sr. Software Engineer - OpenStack Platform > > Deployment Framework TC > > Red Hat EMEA > > https://www.redhat.com/ > > > > > > > > -- > > Regards, > > Rabi Mishra > > > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbultel at redhat.com Tue Aug 18 08:26:56 2020 From: mbultel at redhat.com (Mathieu Bultel) Date: Tue, 18 Aug 2020 10:26:56 +0200 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? In-Reply-To: References: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Message-ID: On Tue, Aug 18, 2020 at 10:08 AM Cédric Jeanneret wrote: > > > On 8/18/20 9:53 AM, Rabi Mishra wrote: > > > > > > On Tue, Aug 18, 2020 at 1:07 PM Cédric Jeanneret > > wrote: > > > > Hello there! > > > > I'm wondering if we could move the "tripleo-ansible-inventory" script > > from the tripleo-validations repo to tripleo-common. > > > > > > TBH, I don't know the history, but it would be better if we remove all > > scripts from tripleo-common and use it just as a utility library (now > > that Mistral is gone). Most of the existing scripts probably have an > > existing command in tripleoclient. We can implement missing ones > > including "tripleo-ansible-inventory" in python-tripleoclient. > > would probably be better to implement it directly in tripleoclient imho. > In any cases, it has nothing to do in tripleo-validations... > +1 with that, it will probably be better to move everything in tripleoclient. > > I can't connect to launchpad, they are having some auth issue, I can't > create an RFE there :(. > > > > > > > The main motivation here is to make things consistent: > > - that script calls content from tripleo-common, nothing from > > tripleo-validations. > > - that script isn't only for the validations, so it makes more sense > to > > install it via tripleo-common > > - in fact, we should probably push that inventory thing as an > `openstack > > tripleo' sub-command, but that's another story > > > > So, is there any opposition to this proposal? > > > > Cheers, > > > > C. > > > > > > -- > > Cédric Jeanneret (He/Him/His) > > Sr. Software Engineer - OpenStack Platform > > Deployment Framework TC > > Red Hat EMEA > > https://www.redhat.com/ > > > > > > > > -- > > Regards, > > Rabi Mishra > > > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Aug 18 10:28:20 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 18 Aug 2020 12:28:20 +0200 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: <20200818102820.l2vxfqhpmetw6gft@skaplons-mac> Hi, On Mon, Aug 17, 2020 at 11:39:44AM -0400, Assaf Muller wrote: > On Mon, Aug 17, 2020 at 10:19 AM Mohammed Naser wrote: > > > > Hi all: > > > > What Fabian is describing is exactly the problem we're having, there > > are _many_ routers in these environments so we'd be looking at N > > requests which can get out of control quickly > > I think it's a clear use case to implement a new API endpoint that > returns HA state per agent for *all* routers in a single call. Should > be easy to implement. I agree with that. Can You maybe propose official RFE for that and describe there Your use case - see [1] for details. > > > > > Thanks > > Mohammed > > > > On Mon, Aug 17, 2020 at 10:05 AM Fabian Zimmermann wrote: > > > > > > Hi, > > > > > > yes for 1 router, but doing this in a loop for hundreds is not so performant ;) > > > > > > Fabian > > > > > > Am Mo., 17. Aug. 2020 um 16:04 Uhr schrieb Assaf Muller : > > > > > > > > On Mon, Aug 17, 2020 at 9:59 AM Fabian Zimmermann wrote: > > > > > > > > > > Hi, > > > > > > > > > > I can just tell you that we are doing a similar check for dhcp-agent, but here we just execute a suitable SQL-statement to detect more than 1 agent / AZ. > > > > > > > > > > Doing the same for L3 shouldn't be that hard, but I dont know if this is what you are looking for? > > > > > > > > There's already an API for this: > > > > neutron l3-agent-list-hosting-router > > > > > > > > It will show you the HA state per L3 agent for the given router. > > > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > Am Mo., 17. Aug. 2020 um 14:11 Uhr schrieb Mohammed Naser : > > > > >> > > > > >> Hi all, > > > > >> > > > > >> Over the past few days, we were troubleshooting an issue that ended up > > > > >> having a root cause where keepalived has somehow ended up active in > > > > >> two different L3 agents. We've yet to find the root cause of how this > > > > >> happened but removing it and adding it resolved the issue for us. > > > > >> > > > > >> As we work on improving our monitoring, we wanted to implement > > > > >> something that gets us the info of # of active routers to check if > > > > >> there's a router that has >1 active L3 agent but it's hard because > > > > >> hitting the /l3-agents endpoint on _every_ single router hurts a lot > > > > >> on performance. > > > > >> > > > > >> Is there something else that we can watch which might be more > > > > >> productive? FYI -- this all goes in the open and will end up inside > > > > >> the openstack-exporter: > > > > >> https://github.com/openstack-exporter/openstack-exporter and the Helm > > > > >> charts will end up with the alerts: > > > > >> https://github.com/openstack-exporter/helm-charts > > > > >> > > > > >> Thanks! > > > > >> Mohammed > > > > >> > > > > >> -- > > > > >> Mohammed Naser > > > > >> VEXXHOST, Inc. > > > > >> > > > > > > > > > > > > -- > > Mohammed Naser > > VEXXHOST, Inc. > > > > [1] https://docs.openstack.org/neutron/latest/contributor/policies/blueprints.html#neutron-request-for-feature-enhancements -- Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Tue Aug 18 10:33:23 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 18 Aug 2020 12:33:23 +0200 Subject: [neutron][gate] verbose q-svc log files and e-r indexing In-Reply-To: References: Message-ID: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> Hi, I opened LP for that [1] and I will propose some fix for it ASAP. On Mon, Aug 17, 2020 at 10:50:15AM -0700, melanie witt wrote: > Hi all, > > Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate. > > We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this: > > Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}} > > We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method. > > Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output: > > $ wc -l screen-q-svc.txt > 186387 screen-q-svc.txt > > $ grep -v "function apply_funcs" screen-q-svc.txt|wc -l > 102593 > > Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging? > > Thanks for your help. > > Cheers, > -melanie > > [1] http://status.openstack.org/elastic-recheck > [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcdn.com/744958/4/check/tempest-integrated-compute/4421bf9/controller/logs/screen-q-svc.txt > [3] https://review.opendev.org/741540 > [1] https://bugs.launchpad.net/neutron/+bug/1892017 -- Slawek Kaplonski Principal software engineer Red Hat From thierry at openstack.org Tue Aug 18 10:44:43 2020 From: thierry at openstack.org (Thierry Carrez) Date: Tue, 18 Aug 2020 12:44:43 +0200 Subject: [simplification] Making ask.openstack.org read-only Message-ID: Hi everyone, This has been discussed several times on this mailing list in the past, but we never got to actually pull the plug. Ask.openstack.org was launched in 2013. The reason for hosting our own setup was to be able to support multiple languages, while StackOverflow rejected our proposal to have our own openstack-branded StackExchange site. The Chinese ask.o.o side never really took off. The English side also never really worked perfectly (like email alerts are hopelessly broken), but we figured it would get better with time if a big community formed around it. Fast-forward to 2020 and the instance is lacking volunteers to help run it, while the code (and our customization of it) has become more complicated to maintain. It regularly fails one way or another, and questions there often go unanswered, making us look bad. Of the top 30 users, most have abandoned the platform since 2017, leaving only Bernd Bausch actively engaging and helping moderate questions lately. We have called for volunteers several times, but the offers for help never really materialized. At the same time, people are asking OpenStack questions on StackOverflow, and sometimes getting answers there[1]. The fragmentation of the "questions" space is not helping users getting good answers. I think it's time to pull the plug, make ask.openstack.org read-only (so that links to old answers are not lost) and redirect users to the mailing-list and the "OpenStack" tag on StackOverflow. I picked StackOverflow since it seems to have the most openstack questions (2,574 on SO, 76 on SuperUser and 430 on ServerFault). We discussed that option several times, but I now proposed a change to actually make it happen: https://review.opendev.org/#/c/746497/ It's always a difficult decision to make to kill a resource, but I feel like in this case, consolidation and simplification would help. Thoughts, comments? [1] https://stackoverflow.com/questions/tagged/openstack -- Thierry From arnaud.morin at gmail.com Tue Aug 18 12:07:08 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Tue, 18 Aug 2020 12:07:08 +0000 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> Message-ID: <20200818120708.GV31915@sync> Hey all, About the vexxhost strategy to use only one rabbit server and manage HA through rabbit. Do you plan to do the same for MariaDB/MySQL? -- Arnaud Morin On 14.08.20 - 18:45, Fabian Zimmermann wrote: > Hi, > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > one rabbitmq Container per Service. Just the kubernetes self healing is > used as "ha" for rabbitmq. > > That seems to match with my finding: run rabbitmq standalone and use an > external system to restart rabbitmq if required. > > Fabian > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > Fabian, > > > > what do you mean? > > > > >> I think vexxhost is running (1) with their openstack-operator - for > > reasons. > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > wrote: > > > > > > Hello again, > > > > > > just a short update about the results of my tests. > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > 1. without durable-queues and without replication - just one > > rabbitmq-process which gets (somehow) restarted if it fails. > > > 2. durable-queues and replication > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > * broken / non working bindings > > > * broken queues > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > reasons. > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > replication but without durable-queues. > > > > > > May someone point me to the best way to document these findings to some > > official doc? > > > I think a lot of installations out there will run into issues if - under > > load - a node fails. > > > > > > Fabian > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > dev.faz at gmail.com>: > > >> > > >> Hi, > > >> > > >> just did some short tests today in our test-environment (without > > durable queues and without replication): > > >> > > >> * started a rally task to generate some load > > >> * kill-9-ed rabbitmq on one node > > >> * rally task immediately stopped and the cloud (mostly) stopped working > > >> > > >> after some debugging i found (again) exchanges which had bindings to > > queues, but these bindings didnt forward any msgs. > > >> Wrote a small script to detect these broken bindings and will now check > > if this is "reproducible" > > >> > > >> then I will try "durable queues" and "durable queues with replication" > > to see if this helps. Even if I would expect > > >> rabbitmq should be able to handle this without these "hidden broken > > bindings" > > >> > > >> This just FYI. > > >> > > >> Fabian > > From jonas.schaefer at cloudandheat.com Tue Aug 18 12:08:42 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Tue, 18 Aug 2020 14:08:42 +0200 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: Message-ID: <6613245.ccrTHCtBl7@antares> Hi Mohammed and all, On Montag, 17. August 2020 14:01:55 CEST Mohammed Naser wrote: > Over the past few days, we were troubleshooting an issue that ended up > having a root cause where keepalived has somehow ended up active in > two different L3 agents. We've yet to find the root cause of how this > happened but removing it and adding it resolved the issue for us. We’ve also seen that behaviour occasionally. The root cause is also unclear for us (so we would’ve love to hear about that). We have anecdotal evidence that a rabbitmq failure was involved, although that makes no sense to me personally. Other causes may be incorrectly cleaned-up namespaces (for example, when you kill or hard-restart the l3 agent, the namespaces will stay around, possibly with the IP address assigned; the keepalived on the other l3 agents will not see the VRRP advertisments anymore and will ALSO assign the IP address. This will also be rectified by a restart always and may require manual namespace cleanup with a tool, a node reboot or an agent disable/enable cycle.). > As we work on improving our monitoring, we wanted to implement > something that gets us the info of # of active routers to check if > there's a router that has >1 active L3 agent but it's hard because > hitting the /l3-agents endpoint on _every_ single router hurts a lot > on performance. > > Is there something else that we can watch which might be more > productive? FYI -- this all goes in the open and will end up inside > the openstack-exporter: > https://github.com/openstack-exporter/openstack-exporter and the Helm > charts will end up with the alerts: > https://github.com/openstack-exporter/helm-charts While I don’t think it fits in your openstack-exporter design, we are currently using the attached script (which we also hereby publish under the terms of the Apache 2.0 license [1]). (Sorry, I lack the time to cleanly publish it somewhere right now.) It checks the state files maintained by the L3 agent conglomerate and exports metrics about the master-ness of the routers as prometheus metrics. Note that this is slightly dangerous since the router IDs are high-cardinality and using that as a label value in Prometheus is discouraged; you may not want to do this in a public cloud setting. Either way: This allows us to alert on routers where there is not exactly one master state. Downside is that this requires the thing to run locally on the l3 agent nodes. Upside is that it is very efficient, and will also show the master state in some cases where the router was not cleaned up properly (e.g. because the l3 agent and its keepaliveds were killed). kind regards, Jonas [1]: http://www.apache.org/licenses/LICENSE-2.0 -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: os_l3_router_exporter.py Type: text/x-python3 Size: 1780 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From amy at demarco.com Tue Aug 18 13:14:28 2020 From: amy at demarco.com (Amy Marrich) Date: Tue, 18 Aug 2020 08:14:28 -0500 Subject: GHC Mentors Needed for OpenStack Message-ID: Grace Hopper Conference is going virtual this year and once again OpenStack is participating as one of the Open Source Day projects. We are hoping to do some peer programming (aka mentees shadowing folks while they work through a patch) as part of the day. Mentors receive a full conference pass and AnitaB.org membership. Please check out the requirements {0} and apply (1) by August 19, 2020. We are also figuring a way for more folks to be able to mentor, so if you'd like to help but aren't interested in the conference please reach out to me or Victoria(vkmc) by email or on IRC. Thanks and apologies for the short deadline though I can probably get let additions in:) Amy (spotz) 0- Grace Hopper mentorship requirements: https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 1- Grace Hopper mentorship application: https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasowang at redhat.com Tue Aug 18 03:24:30 2020 From: jasowang at redhat.com (Jason Wang) Date: Tue, 18 Aug 2020 11:24:30 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200814051601.GD15344@joy-OptiPlex-7040> References: <20200804183503.39f56516.cohuck@redhat.com> <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> Message-ID: On 2020/8/14 下午1:16, Yan Zhao wrote: > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: >> On 2020/8/10 下午3:46, Yan Zhao wrote: >>>> driver is it handled by? >>> It looks that the devlink is for network device specific, and in >>> devlink.h, it says >>> include/uapi/linux/devlink.h - Network physical device Netlink >>> interface, >> >> Actually not, I think there used to have some discussion last year and the >> conclusion is to remove this comment. >> >> It supports IB and probably vDPA in the future. >> > hmm... sorry, I didn't find the referred discussion. only below discussion > regarding to why to add devlink. > > https://www.mail-archive.com/netdev at vger.kernel.org/msg95801.html > >This doesn't seem to be too much related to networking? Why can't something > >like this be in sysfs? > > It is related to networking quite bit. There has been couple of > iteration of this, including sysfs and configfs implementations. There > has been a consensus reached that this should be done by netlink. I > believe netlink is really the best for this purpose. Sysfs is not a good > idea See the discussion here: https://patchwork.ozlabs.org/project/netdev/patch/20191115223355.1277139-1-jeffrey.t.kirsher at intel.com/ > > https://www.mail-archive.com/netdev at vger.kernel.org/msg96102.html > >there is already a way to change eth/ib via > >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1 > > > >sounds like this is another way to achieve the same? > > It is. However the current way is driver-specific, not correct. > For mlx5, we need the same, it cannot be done in this way. Do devlink is > the correct way to go. > > https://lwn.net/Articles/674867/ > There a is need for some userspace API that would allow to expose things > that are not directly related to any device class like net_device of > ib_device, but rather chip-wide/switch-ASIC-wide stuff. > > Use cases: > 1) get/set of port type (Ethernet/InfiniBand) > 2) monitoring of hardware messages to and from chip > 3) setting up port splitters - split port into multiple ones and squash again, > enables usage of splitter cable > 4) setting up shared buffers - shared among multiple ports within one chip > > > > we actually can also retrieve the same information through sysfs, .e.g > > |- [path to device] > |--- migration > | |--- self > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > | |--- compatible > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > Yes but: - You need one file per attribute (one syscall for one attribute) - Attribute is coupled with kobject All of above seems unnecessary. Another point, as we discussed in another thread, it's really hard to make sure the above API work for all types of devices and frameworks. So having a vendor specific API looks much better. > >>> I feel like it's not very appropriate for a GPU driver to use >>> this interface. Is that right? >> >> I think not though most of the users are switch or ethernet devices. It >> doesn't prevent you from inventing new abstractions. > so need to patch devlink core and the userspace devlink tool? > e.g. devlink migration It quite flexible, you can extend devlink, invent your own or let mgmt to establish devlink directly. > >> Note that devlink is based on netlink, netlink has been widely used by >> various subsystems other than networking. > the advantage of netlink I see is that it can monitor device status and > notify upper layer that migration database needs to get updated. I may miss something, but why this is needed? From device point of view, the following capability should be sufficient to support live migration: - set/get device state - report dirty page tracking - set/get capability > But not sure whether openstack would like to use this capability. > As Sean said, it's heavy for openstack. it's heavy for vendor driver > as well :) Well, it depends several factors. Just counting LOCs, sysfs based attributes is not lightweight. Thanks > > And devlink monitor now listens the notification and dumps the state > changes. If we want to use it, need to let it forward the notification > and dumped info to openstack, right? > > Thanks > Yan > From antonios.dimtsoudis at cloud.ionos.com Tue Aug 18 08:24:36 2020 From: antonios.dimtsoudis at cloud.ionos.com (Antonios Dimtsoudis) Date: Tue, 18 Aug 2020 10:24:36 +0200 Subject: [monasca] Setup Monasca from scratch Message-ID: <5e457dae-dc7c-3693-dc34-e622c2cd40f8@cloud.ionos.com> Hi all, i am trying to set up Monasca from scratch. Is there a good introduction / point to start of you would recommend? Thanks in advance, Antonios. From berrange at redhat.com Tue Aug 18 08:55:27 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 18 Aug 2020 09:55:27 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> Message-ID: <20200818085527.GB20215@redhat.com> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > driver is it handled by? > > > > It looks that the devlink is for network device specific, and in > > > > devlink.h, it says > > > > include/uapi/linux/devlink.h - Network physical device Netlink > > > > interface, > > > > > > Actually not, I think there used to have some discussion last year and the > > > conclusion is to remove this comment. > > > > > > It supports IB and probably vDPA in the future. > > > > > hmm... sorry, I didn't find the referred discussion. only below discussion > > regarding to why to add devlink. > > > > https://www.mail-archive.com/netdev at vger.kernel.org/msg95801.html > > >This doesn't seem to be too much related to networking? Why can't something > > >like this be in sysfs? > > > > It is related to networking quite bit. There has been couple of > > iteration of this, including sysfs and configfs implementations. There > > has been a consensus reached that this should be done by netlink. I > > believe netlink is really the best for this purpose. Sysfs is not a good > > idea > > > See the discussion here: > > https://patchwork.ozlabs.org/project/netdev/patch/20191115223355.1277139-1-jeffrey.t.kirsher at intel.com/ > > > > > > https://www.mail-archive.com/netdev at vger.kernel.org/msg96102.html > > >there is already a way to change eth/ib via > > >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1 > > > > > >sounds like this is another way to achieve the same? > > > > It is. However the current way is driver-specific, not correct. > > For mlx5, we need the same, it cannot be done in this way. Do devlink is > > the correct way to go. > > > > https://lwn.net/Articles/674867/ > > There a is need for some userspace API that would allow to expose things > > that are not directly related to any device class like net_device of > > ib_device, but rather chip-wide/switch-ASIC-wide stuff. > > > > Use cases: > > 1) get/set of port type (Ethernet/InfiniBand) > > 2) monitoring of hardware messages to and from chip > > 3) setting up port splitters - split port into multiple ones and squash again, > > enables usage of splitter cable > > 4) setting up shared buffers - shared among multiple ports within one chip > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > |- [path to device] > > |--- migration > > | |--- self > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > | |--- compatible > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > > > Yes but: > > - You need one file per attribute (one syscall for one attribute) > - Attribute is coupled with kobject > > All of above seems unnecessary. > > Another point, as we discussed in another thread, it's really hard to make > sure the above API work for all types of devices and frameworks. So having a > vendor specific API looks much better. >From the POV of userspace mgmt apps doing device compat checking / migration, we certainly do NOT want to use different vendor specific APIs. We want to have an API that can be used / controlled in a standard manner across vendors. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From jasowang at redhat.com Tue Aug 18 09:01:51 2020 From: jasowang at redhat.com (Jason Wang) Date: Tue, 18 Aug 2020 17:01:51 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818085527.GB20215@redhat.com> References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> Message-ID: <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> An HTML attachment was scrubbed... URL: From cohuck at redhat.com Tue Aug 18 09:06:17 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Tue, 18 Aug 2020 11:06:17 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818085527.GB20215@redhat.com> References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> Message-ID: <20200818110617.05def37c.cohuck@redhat.com> On Tue, 18 Aug 2020 09:55:27 +0100 Daniel P. Berrangé wrote: > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > Another point, as we discussed in another thread, it's really hard to make > > sure the above API work for all types of devices and frameworks. So having a > > vendor specific API looks much better. > > From the POV of userspace mgmt apps doing device compat checking / migration, > we certainly do NOT want to use different vendor specific APIs. We want to > have an API that can be used / controlled in a standard manner across vendors. As we certainly will need to have different things to check for different device types and vendor drivers, would it still be fine to have differing (say) attributes, as long as they are presented (and can be discovered) in a standardized way? (See e.g. what I came up with for vfio-ccw in a different branch of this thread.) E.g. version= .type_specific_value0= .type_specific_value1= .vendor_driver_specific_value0= with a type or vendor driver having some kind of get_supported_attributes method? From berrange at redhat.com Tue Aug 18 09:16:28 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 18 Aug 2020 10:16:28 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> Message-ID: <20200818091628.GC20215@redhat.com> Your mail came through as HTML-only so all the quoting and attribution is mangled / lost now :-( On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > On 2020/8/10 下午3:46, Yan Zhao wrote: > we actually can also retrieve the same information through sysfs, .e.g > > |- [path to device] > |--- migration > | |--- self > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > | |--- compatible > | | |---device_api > | | |---mdev_type > | | |---software_version > | | |---device_id > | | |---aggregator > > > Yes but: > > - You need one file per attribute (one syscall for one attribute) > - Attribute is coupled with kobject > > All of above seems unnecessary. > > Another point, as we discussed in another thread, it's really hard to make > sure the above API work for all types of devices and frameworks. So having a > vendor specific API looks much better. > > From the POV of userspace mgmt apps doing device compat checking / migration, > we certainly do NOT want to use different vendor specific APIs. We want to > have an API that can be used / controlled in a standard manner across vendors. > > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a > long debate on sysfs vs devlink). So if we go with sysfs, at least two > APIs needs to be supported ... NB, I was not questioning devlink vs sysfs directly. If devlink is related to netlink, I can't say I'm enthusiastic as IMKE sysfs is easier to deal with. I don't know enough about devlink to have much of an opinion though. The key point was that I don't want the userspace APIs we need to deal with to be vendor specific. What I care about is that we have a *standard* userspace API for performing device compatibility checking / state migration, for use by QEMU/libvirt/ OpenStack, such that we can write code without countless vendor specific code paths. If there is vendor specific stuff on the side, that's fine as we can ignore that, but the core functionality for device compat / migration needs to be standardized. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From berrange at redhat.com Tue Aug 18 09:24:33 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 18 Aug 2020 10:24:33 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818110617.05def37c.cohuck@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <20200818110617.05def37c.cohuck@redhat.com> Message-ID: <20200818092433.GD20215@redhat.com> On Tue, Aug 18, 2020 at 11:06:17AM +0200, Cornelia Huck wrote: > On Tue, 18 Aug 2020 09:55:27 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > Another point, as we discussed in another thread, it's really hard to make > > > sure the above API work for all types of devices and frameworks. So having a > > > vendor specific API looks much better. > > > > From the POV of userspace mgmt apps doing device compat checking / migration, > > we certainly do NOT want to use different vendor specific APIs. We want to > > have an API that can be used / controlled in a standard manner across vendors. > > As we certainly will need to have different things to check for > different device types and vendor drivers, would it still be fine to > have differing (say) attributes, as long as they are presented (and can > be discovered) in a standardized way? Yes, the control API and algorithm to deal with the problem needs to have standardization, but the data passed in/out of the APIs can vary. Essentially the key is that vendors should be able to create devices at the kernel, and those devices should "just work" with the existing generic userspace migration / compat checking code, without needing extra vendor specific logic to be added. Note, I'm not saying that the userspace decisions would be perfectly optimal based on generic code. They might be making a simplified decision that while functionally safe, is not the ideal solution. Adding vendor specific code might be able to optimize the userspace decisions, but that should be considered just optimization, not a core must have for any opertion. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From parav at nvidia.com Tue Aug 18 09:32:55 2020 From: parav at nvidia.com (Parav Pandit) Date: Tue, 18 Aug 2020 09:32:55 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> Message-ID: Hi Jason, From: Jason Wang Sent: Tuesday, August 18, 2020 2:32 PM On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: On 2020/8/14 下午1:16, Yan Zhao wrote: On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: On 2020/8/10 下午3:46, Yan Zhao wrote: driver is it handled by? It looks that the devlink is for network device specific, and in devlink.h, it says include/uapi/linux/devlink.h - Network physical device Netlink interface, Actually not, I think there used to have some discussion last year and the conclusion is to remove this comment. [...] > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a long debate on sysfs vs devlink). So if we go with sysfs, at least two APIs needs to be supported ... We had internal discussion and proposal on this topic. I wanted Eli Cohen to be back from vacation on Wed 8/19, but since this is active discussion right now, I will share the thoughts anyway. Here are the initial round of thoughts and proposal. User requirements: --------------------------- 1. User might want to create one or more vdpa devices per PCI PF/VF/SF. 2. User might want to create one or more vdpa devices of type net/blk or other type. 3. User needs to look and dump at the health of the queues for debug purpose. 4. During vdpa net device creation time, user may have to provide a MAC address and/or VLAN. 5. User should be able to set/query some of the attributes for debug/compatibility check 6. When user wants to create vdpa device, it needs to know which device supports creation. 7. User should be able to see the queue statistics of doorbells, wqes etc regardless of class type To address above requirements, there is a need of vendor agnostic tool, so that user can create/config/delete vdpa device(s) regardless of the vendor. Hence, We should have a tool that lets user do it. Examples: ------------- (a) List parent devices which supports creating vdpa devices. It also shows which class types supported by this parent device. In below command two parent devices support vdpa device creation. First is PCI VF whose bdf is 03.00:5. Second is PCI SF whose name is mlx5_sf.1 $ vdpa list pd pci/0000:03.00:5 class_supports net vdpa virtbus/mlx5_sf.1 class_supports net (b) Now add a vdpa device and show the device. $ vdpa dev add pci/0000:03.00:5 type net $ vdpa dev show vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues 4 (c) vdpa dev show features vdpa0 iommu platform version 1 (d) dump vdpa statistics $ vdpa dev stats show vdpa0 kickdoorbells 10 wqes 100 (e) Now delete a vdpa device previously created. $ vdpa dev del vdpa0 Design overview: ----------------------- 1. Above example tool runs over netlink socket interface. 2. This enables users to return meaningful error strings in addition to code so that user can be more informed. Often this is missing in ioctl()/configfs/sysfs interfaces. 3. This tool over netlink enables syscaller tests to be more usable like other subsystems to keep kernel robust 4. This provides vendor agnostic view of all vdpa capable parent and vdpa devices. 5. Each driver which supports vdpa device creation, registers the parent device along with supported classes. FAQs: -------- 1. Why not using devlink? Ans: Because as vdpa echo system grows, devlink will fall short of extending vdpa specific params, attributes, stats. 2. Why not use sysfs? Ans: (a) Because running syscaller infrastructure can run well over netlink sockets like it runs for several subsystem. (b) it lacks the ability to return error messages. Doing via kernel log is just doesn't work. (c) Why not using some ioctl()? It will reinvent the wheel of netlink that has TLV formats for several attributes. 3. Why not configs? It follows same limitation as that of sysfs. Low level design and driver APIS: -------------------------------------------- Will post once we discuss this further. From cohuck at redhat.com Tue Aug 18 09:36:52 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Tue, 18 Aug 2020 11:36:52 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818091628.GC20215@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> Message-ID: <20200818113652.5d81a392.cohuck@redhat.com> On Tue, 18 Aug 2020 10:16:28 +0100 Daniel P. Berrangé wrote: > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > we actually can also retrieve the same information through sysfs, .e.g > > > > |- [path to device] > > |--- migration > > | |--- self > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > | |--- compatible > > | | |---device_api > > | | |---mdev_type > > | | |---software_version > > | | |---device_id > > | | |---aggregator > > > > > > Yes but: > > > > - You need one file per attribute (one syscall for one attribute) > > - Attribute is coupled with kobject Is that really that bad? You have the device with an embedded kobject anyway, and you can just put things into an attribute group? [Also, I think that self/compatible split in the example makes things needlessly complex. Shouldn't semantic versioning and matching already cover nearly everything? I would expect very few cases that are more complex than that. Maybe the aggregation stuff, but I don't think we need that self/compatible split for that, either.] > > > > All of above seems unnecessary. > > > > Another point, as we discussed in another thread, it's really hard to make > > sure the above API work for all types of devices and frameworks. So having a > > vendor specific API looks much better. > > > > From the POV of userspace mgmt apps doing device compat checking / migration, > > we certainly do NOT want to use different vendor specific APIs. We want to > > have an API that can be used / controlled in a standard manner across vendors. > > > > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a > > long debate on sysfs vs devlink). So if we go with sysfs, at least two > > APIs needs to be supported ... > > NB, I was not questioning devlink vs sysfs directly. If devlink is related > to netlink, I can't say I'm enthusiastic as IMKE sysfs is easier to deal > with. I don't know enough about devlink to have much of an opinion though. > The key point was that I don't want the userspace APIs we need to deal with > to be vendor specific. From what I've seen of devlink, it seems quite nice; but I understand why sysfs might be easier to deal with (especially as there's likely already a lot of code using it.) I understand that some users would like devlink because it is already widely used for network drivers (and some others), but I don't think the majority of devices used with vfio are network (although certainly a lot of them are.) > > What I care about is that we have a *standard* userspace API for performing > device compatibility checking / state migration, for use by QEMU/libvirt/ > OpenStack, such that we can write code without countless vendor specific > code paths. > > If there is vendor specific stuff on the side, that's fine as we can ignore > that, but the core functionality for device compat / migration needs to be > standardized. To summarize: - choose one of sysfs or devlink - have a common interface, with a standardized way to add vendor-specific attributes ? From cohuck at redhat.com Tue Aug 18 09:38:55 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Tue, 18 Aug 2020 11:38:55 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818092433.GD20215@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <20200818110617.05def37c.cohuck@redhat.com> <20200818092433.GD20215@redhat.com> Message-ID: <20200818113855.647938c0.cohuck@redhat.com> On Tue, 18 Aug 2020 10:24:33 +0100 Daniel P. Berrangé wrote: > On Tue, Aug 18, 2020 at 11:06:17AM +0200, Cornelia Huck wrote: > > On Tue, 18 Aug 2020 09:55:27 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > Another point, as we discussed in another thread, it's really hard to make > > > > sure the above API work for all types of devices and frameworks. So having a > > > > vendor specific API looks much better. > > > > > > From the POV of userspace mgmt apps doing device compat checking / migration, > > > we certainly do NOT want to use different vendor specific APIs. We want to > > > have an API that can be used / controlled in a standard manner across vendors. > > > > As we certainly will need to have different things to check for > > different device types and vendor drivers, would it still be fine to > > have differing (say) attributes, as long as they are presented (and can > > be discovered) in a standardized way? > > Yes, the control API and algorithm to deal with the problem needs to > have standardization, but the data passed in/out of the APIs can vary. > > Essentially the key is that vendors should be able to create devices > at the kernel, and those devices should "just work" with the existing > generic userspace migration / compat checking code, without needing > extra vendor specific logic to be added. > > Note, I'm not saying that the userspace decisions would be perfectly > optimal based on generic code. They might be making a simplified > decision that while functionally safe, is not the ideal solution. > Adding vendor specific code might be able to optimize the userspace > decisions, but that should be considered just optimization, not a > core must have for any opertion. Yes, that sounds reasonable. From parav at nvidia.com Tue Aug 18 09:39:24 2020 From: parav at nvidia.com (Parav Pandit) Date: Tue, 18 Aug 2020 09:39:24 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818113652.5d81a392.cohuck@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> Message-ID: Hi Cornelia, > From: Cornelia Huck > Sent: Tuesday, August 18, 2020 3:07 PM > To: Daniel P. Berrangé > Cc: Jason Wang ; Yan Zhao > ; kvm at vger.kernel.org; libvir-list at redhat.com; > qemu-devel at nongnu.org; Kirti Wankhede ; > eauger at redhat.com; xin-ran.wang at intel.com; corbet at lwn.net; openstack- > discuss at lists.openstack.org; shaohe.feng at intel.com; kevin.tian at intel.com; > Parav Pandit ; jian-feng.ding at intel.com; > dgilbert at redhat.com; zhenyuw at linux.intel.com; hejie.xu at intel.com; > bao.yumeng at zte.com.cn; Alex Williamson ; > eskultet at redhat.com; smooney at redhat.com; intel-gvt- > dev at lists.freedesktop.org; Jiri Pirko ; > dinechin at redhat.com; devel at ovirt.org > Subject: Re: device compatibility interface for live migration with assigned > devices > > On Tue, 18 Aug 2020 10:16:28 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > we actually can also retrieve the same information through sysfs, > > > .e.g > > > > > > |- [path to device] > > > |--- migration > > > | |--- self > > > | | |---device_api > > > | | |---mdev_type > > > | | |---software_version > > > | | |---device_id > > > | | |---aggregator > > > | |--- compatible > > > | | |---device_api > > > | | |---mdev_type > > > | | |---software_version > > > | | |---device_id > > > | | |---aggregator > > > > > > > > > Yes but: > > > > > > - You need one file per attribute (one syscall for one attribute) > > > - Attribute is coupled with kobject > > Is that really that bad? You have the device with an embedded kobject > anyway, and you can just put things into an attribute group? > > [Also, I think that self/compatible split in the example makes things > needlessly complex. Shouldn't semantic versioning and matching already > cover nearly everything? I would expect very few cases that are more > complex than that. Maybe the aggregation stuff, but I don't think we need > that self/compatible split for that, either.] > > > > > > > All of above seems unnecessary. > > > > > > Another point, as we discussed in another thread, it's really hard > > > to make sure the above API work for all types of devices and > > > frameworks. So having a vendor specific API looks much better. > > > > > > From the POV of userspace mgmt apps doing device compat checking / > > > migration, we certainly do NOT want to use different vendor > > > specific APIs. We want to have an API that can be used / controlled in a > standard manner across vendors. > > > > > > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a > > > long debate on sysfs vs devlink). So if we go with sysfs, at least two > > > APIs needs to be supported ... > > > > NB, I was not questioning devlink vs sysfs directly. If devlink is > > related to netlink, I can't say I'm enthusiastic as IMKE sysfs is > > easier to deal with. I don't know enough about devlink to have much of an > opinion though. > > The key point was that I don't want the userspace APIs we need to deal > > with to be vendor specific. > > From what I've seen of devlink, it seems quite nice; but I understand why > sysfs might be easier to deal with (especially as there's likely already a lot of > code using it.) > > I understand that some users would like devlink because it is already widely > used for network drivers (and some others), but I don't think the majority of > devices used with vfio are network (although certainly a lot of them are.) > > > > > What I care about is that we have a *standard* userspace API for > > performing device compatibility checking / state migration, for use by > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > vendor specific code paths. > > > > If there is vendor specific stuff on the side, that's fine as we can > > ignore that, but the core functionality for device compat / migration > > needs to be standardized. > > To summarize: > - choose one of sysfs or devlink > - have a common interface, with a standardized way to add > vendor-specific attributes > ? Please refer to my previous email which has more example and details. From dbengt at redhat.com Tue Aug 18 10:19:35 2020 From: dbengt at redhat.com (Daniel Bengtsson) Date: Tue, 18 Aug 2020 12:19:35 +0200 Subject: Can't fetch from opendev. In-Reply-To: <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> Message-ID: <6590e740-00f1-ee60-ac00-5872039e0cb0@redhat.com> On 8/17/20 4:37 PM, Jeremy Stanley wrote: > [keeping Daniel in Cc as he doesn't appear to be subscribed] I will check why. I don't understand the problem. > What command(s) did you run and what error message is Git giving > you? That paste doesn't look like an error, just a trace of the > internal operations which were performed. I try only to do a fetch on this remote. I have no explicit error. But the fetch blocks indefinitely. > Are you and your colleague both connecting from the same network? > Possibly the same corporate network or the same VPN?I'm not sure to understand what is the problem with the vpn. But yes we used the same one. From emilien at redhat.com Tue Aug 18 14:28:06 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 18 Aug 2020 10:28:06 -0400 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo Message-ID: Hi people, If you don't know Takashi yet, he has been involved in the Puppet OpenStack project and helped *a lot* in its maintenance (and by maintenance I mean not-funny-work). When our community was getting smaller and smaller, he joined us and our review velicity went back to eleven. He became a core maintainer very quickly and we're glad to have him onboard. He's also been involved in taking care of puppet-tripleo for a few months and I believe he has more than enough knowledge on the module to provide core reviews and be part of the core maintainer group. I also noticed his amount of contribution (bug fixes, improvements, reviews, etc) in other TripleO repos and I'm confident he'll make his road to be core in TripleO at some point. For now I would like him to propose him to be core in puppet-tripleo. As usual, any feedback is welcome but in the meantime I want to thank Takashi for his work in TripleO and we're super happy to have new contributors! Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Tue Aug 18 14:37:44 2020 From: abishop at redhat.com (Alan Bishop) Date: Tue, 18 Aug 2020 07:37:44 -0700 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: On Tue, Aug 18, 2020 at 7:34 AM Emilien Macchi wrote: > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by maintenance > I mean not-funny-work). When our community was getting smaller and smaller, > he joined us and our review velicity went back to eleven. He became a core > maintainer very quickly and we're glad to have him onboard. > > He's also been involved in taking care of puppet-tripleo for a few months > and I believe he has more than enough knowledge on the module to provide > core reviews and be part of the core maintainer group. I also noticed his > amount of contribution (bug fixes, improvements, reviews, etc) in other > TripleO repos and I'm confident he'll make his road to be core in TripleO > at some point. For now I would like him to propose him to be core in > puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to thank > Takashi for his work in TripleO and we're super happy to have new > contributors! > Big +1 from me! > Thanks, > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Aug 18 14:46:55 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 18 Aug 2020 08:46:55 -0600 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: +1 On Tue, Aug 18, 2020 at 8:38 AM Emilien Macchi wrote: > > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet OpenStack project and helped *a lot* in its maintenance (and by maintenance I mean not-funny-work). When our community was getting smaller and smaller, he joined us and our review velicity went back to eleven. He became a core maintainer very quickly and we're glad to have him onboard. > > He's also been involved in taking care of puppet-tripleo for a few months and I believe he has more than enough knowledge on the module to provide core reviews and be part of the core maintainer group. I also noticed his amount of contribution (bug fixes, improvements, reviews, etc) in other TripleO repos and I'm confident he'll make his road to be core in TripleO at some point. For now I would like him to propose him to be core in puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to thank Takashi for his work in TripleO and we're super happy to have new contributors! > > Thanks, > -- > Emilien Macchi From moreira.belmiro.email.lists at gmail.com Tue Aug 18 14:49:37 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Tue, 18 Aug 2020 16:49:37 +0200 Subject: [nova][ops] Live migration and CPU features Message-ID: Hi, in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are rebooted, bringing a new kernel, kvm, ... In order to have a good compromise between performance and flexibility (live migration) we have been using "host-model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues because we have the same hardware type per cell. The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that were introduced because of the reboot (using centOS). If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live migration destination. I wonder how other operators are solving this issue. I don't like stopping OS upgrades. What I'm considering is to define a "custom" cpu_mode for each hardware type. I would appreciate your comments and learn how you are solving this problem. Belmiro -------------- next part -------------- An HTML attachment was scrubbed... URL: From amuller at redhat.com Tue Aug 18 14:48:23 2020 From: amuller at redhat.com (Assaf Muller) Date: Tue, 18 Aug 2020 10:48:23 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: <6613245.ccrTHCtBl7@antares> References: <6613245.ccrTHCtBl7@antares> Message-ID: On Tue, Aug 18, 2020 at 8:12 AM Jonas Schäfer wrote: > > Hi Mohammed and all, > > On Montag, 17. August 2020 14:01:55 CEST Mohammed Naser wrote: > > Over the past few days, we were troubleshooting an issue that ended up > > having a root cause where keepalived has somehow ended up active in > > two different L3 agents. We've yet to find the root cause of how this > > happened but removing it and adding it resolved the issue for us. > > We’ve also seen that behaviour occasionally. The root cause is also unclear > for us (so we would’ve love to hear about that). Insert shameless plug for the Neutron OVN backend. One of it's advantages is that it's L3 HA architecture is cleaner and more scalable (this is coming from the dude that wrote the L3 HA code we're all suffering from =D). The ML2/OVS L3 HA architecture has it's issues - I've seen it work at 100's of customer sites at scale, so I don't want to knock it too much, but just a day ago I got an internal customer ticket about keepalived falling over on a particular router that has 200 floating IPs. It works but it's not perfect. I'm sure the OVN implementation isn't either but it's simply cleaner and has less moving parts. It uses BFD to monitor the tunnel endpoints, so failover is faster too. Plus, it doesn't use keepalived. > We have anecdotal evidence > that a rabbitmq failure was involved, although that makes no sense to me > personally. Other causes may be incorrectly cleaned-up namespaces (for > example, when you kill or hard-restart the l3 agent, the namespaces will stay > around, possibly with the IP address assigned; the keepalived on the other l3 > agents will not see the VRRP advertisments anymore and will ALSO assign the IP > address. This will also be rectified by a restart always and may require > manual namespace cleanup with a tool, a node reboot or an agent disable/enable > cycle.). > > > As we work on improving our monitoring, we wanted to implement > > something that gets us the info of # of active routers to check if > > there's a router that has >1 active L3 agent but it's hard because > > hitting the /l3-agents endpoint on _every_ single router hurts a lot > > on performance. > > > > Is there something else that we can watch which might be more > > productive? FYI -- this all goes in the open and will end up inside > > the openstack-exporter: > > https://github.com/openstack-exporter/openstack-exporter and the Helm > > charts will end up with the alerts: > > https://github.com/openstack-exporter/helm-charts > > While I don’t think it fits in your openstack-exporter design, we are > currently using the attached script (which we also hereby publish under the > terms of the Apache 2.0 license [1]). (Sorry, I lack the time to cleanly > publish it somewhere right now.) > > It checks the state files maintained by the L3 agent conglomerate and exports > metrics about the master-ness of the routers as prometheus metrics. > > Note that this is slightly dangerous since the router IDs are high-cardinality > and using that as a label value in Prometheus is discouraged; you may not want > to do this in a public cloud setting. > > Either way: This allows us to alert on routers where there is not exactly one > master state. Downside is that this requires the thing to run locally on the > l3 agent nodes. Upside is that it is very efficient, and will also show the > master state in some cases where the router was not cleaned up properly (e.g. > because the l3 agent and its keepaliveds were killed). > > kind regards, > Jonas > > [1]: http://www.apache.org/licenses/LICENSE-2.0 > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp From beagles at redhat.com Tue Aug 18 14:53:03 2020 From: beagles at redhat.com (Brent Eagles) Date: Tue, 18 Aug 2020 12:23:03 -0230 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: +1 On Tue, Aug 18, 2020 at 12:01 PM Emilien Macchi wrote: > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by maintenance > I mean not-funny-work). When our community was getting smaller and smaller, > he joined us and our review velicity went back to eleven. He became a core > maintainer very quickly and we're glad to have him onboard. > > He's also been involved in taking care of puppet-tripleo for a few months > and I believe he has more than enough knowledge on the module to provide > core reviews and be part of the core maintainer group. I also noticed his > amount of contribution (bug fixes, improvements, reviews, etc) in other > TripleO repos and I'm confident he'll make his road to be core in TripleO > at some point. For now I would like him to propose him to be core in > puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to thank > Takashi for his work in TripleO and we're super happy to have new > contributors! > > Thanks, > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Aug 18 15:00:52 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 18 Aug 2020 17:00:52 +0200 Subject: [neutron][gate] verbose q-svc log files and e-r indexing In-Reply-To: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> References: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> Message-ID: <20200818150052.u4xkjsptejikwcny@skaplons-mac> Hi, I proposed patch [1] which seems that decreased size of the neutron-server log a bit - see [2] but it's still about 40M :/ [1] https://review.opendev.org/#/c/730879/ [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcdn.com/746714/1/check/tempest-full-py3/5c1399c/controller/logs/ On Tue, Aug 18, 2020 at 12:33:23PM +0200, Slawek Kaplonski wrote: > Hi, > > I opened LP for that [1] and I will propose some fix for it ASAP. > > On Mon, Aug 17, 2020 at 10:50:15AM -0700, melanie witt wrote: > > Hi all, > > > > Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate. > > > > We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this: > > > > Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}} > > > > We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method. > > > > Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output: > > > > $ wc -l screen-q-svc.txt > > 186387 screen-q-svc.txt > > > > $ grep -v "function apply_funcs" screen-q-svc.txt|wc -l > > 102593 > > > > Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging? > > > > Thanks for your help. > > > > Cheers, > > -melanie > > > > [1] http://status.openstack.org/elastic-recheck > > [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcdn.com/744958/4/check/tempest-integrated-compute/4421bf9/controller/logs/screen-q-svc.txt > > [3] https://review.opendev.org/741540 > > > > [1] https://bugs.launchpad.net/neutron/+bug/1892017 > > -- > Slawek Kaplonski > Principal software engineer > Red Hat -- Slawek Kaplonski Principal software engineer Red Hat From luis.ramirez at opencloud.es Tue Aug 18 15:01:09 2020 From: luis.ramirez at opencloud.es (Luis Ramirez) Date: Tue, 18 Aug 2020 17:01:09 +0200 Subject: [nova][ops] Live migration and CPU features In-Reply-To: References: Message-ID: Hi, Try to choose a custom cpu_model that fits into your infra. This should be the best approach to avoid this kind of problem. If the performance is not an issue for the tenants, KVM64 should be a good election. Br, Luis Rmz Blockchain, DevOps & Open Source Cloud Solutions Architect ---------------------------------------- Founder & CEO OpenCloud.es luis.ramirez at opencloud.es Skype ID: d.overload Hangouts: luis.ramirez at opencloud.es [image: ] +34 911 950 123 / [image: ]+39 392 1289553 / [image: ]+49 152 26917722 / Česká republika: +420 774 274 882 ----------------------------------------------------- El mar., 18 ago. 2020 a las 16:55, Belmiro Moreira (< moreira.belmiro.email.lists at gmail.com>) escribió: > Hi, > in our infrastructure we have always compute nodes that need a hardware > intervention and as a consequence they are rebooted, bringing a new kernel, > kvm, ... > > In order to have a good compromise between performance and flexibility > (live migration) we have been using "host-model" for the "cpu_mode" > configuration of our service VMs. We didn't expect to have CPU > compatibility issues because we have the same hardware type per cell. > > The problem is that when a compute node is rebooted the instance domain is > recreated with the new cpu features that were introduced because of the > reboot (using centOS). > > If there are new CPU features exposed, this basically blocks live > migration to all the non rebooted compute nodes (those cpu features are not > exposed, yet). The nova-scheduler doesn't know about them when scheduling > the live migration destination. > > I wonder how other operators are solving this issue. > I don't like stopping OS upgrades. > What I'm considering is to define a "custom" cpu_mode for each hardware > type. > > I would appreciate your comments and learn how you are solving this > problem. > > Belmiro > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Tue Aug 18 15:01:35 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 18 Aug 2020 17:01:35 +0200 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: <1c51af8b-50ed-ccc7-61bf-3569cbc81d43@redhat.com> On 8/18/20 4:37 PM, Alan Bishop wrote: > > > On Tue, Aug 18, 2020 at 7:34 AM Emilien Macchi > wrote: > > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by > maintenance I mean not-funny-work). When our community was getting > smaller and smaller, he joined us and our review velicity went back > to eleven. He became a core maintainer very quickly and we're glad > to have him onboard. > > He's also been involved in taking care of puppet-tripleo for a few > months and I believe he has more than enough knowledge on the module > to provide core reviews and be part of the core maintainer group. I > also noticed his amount of contribution (bug fixes, improvements, > reviews, etc) in other TripleO repos and I'm confident he'll make > his road to be core in TripleO at some point. For now I would like > him to propose him to be core in puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to > thank Takashi for his work in TripleO and we're super happy to have > new contributors! > > > Big +1 from me! +1 > > > Thanks, > -- > Emilien Macchi > -- Best regards, Bogdan Dobrelya, Irc #bogdando From dev.faz at gmail.com Tue Aug 18 15:06:47 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 18 Aug 2020 17:06:47 +0200 Subject: [nova][ops] Live migration and CPU features In-Reply-To: References: Message-ID: Hi, We are using the "custom"-way. But this does not protect you from all issues. We had problems with a new cpu-generation not (jet) detected correctly in an libvirt-version. So libvirt failed back to the "desktop"-cpu of this newer generation, but didnt support/detect some features => blocked live-migration. Fabian Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira : > > Hi, > in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are rebooted, bringing a new kernel, kvm, ... > > In order to have a good compromise between performance and flexibility (live migration) we have been using "host-model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues because we have the same hardware type per cell. > > The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that were introduced because of the reboot (using centOS). > > If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live migration destination. > > I wonder how other operators are solving this issue. > I don't like stopping OS upgrades. > What I'm considering is to define a "custom" cpu_mode for each hardware type. > > I would appreciate your comments and learn how you are solving this problem. > > Belmiro > From smooney at redhat.com Tue Aug 18 15:11:45 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 18 Aug 2020 16:11:45 +0100 Subject: [nova][ops] Live migration and CPU features In-Reply-To: References: Message-ID: <10be83e71171f752a926af614d4541ab77d385e8.camel@redhat.com> On Tue, 2020-08-18 at 17:01 +0200, Luis Ramirez wrote: > Hi, > > Try to choose a custom cpu_model that fits into your infra. This should be > the best approach to avoid this kind of problem. If the performance is not > an issue for the tenants, KVM64 should be a good election. you should neve use kvm64 in production it is not maintained for security vulnerablity e.g. it is never updated with any fo the feature flag to mitigate security issue like specter ectra. its perfect for ci and test where you dont contol the underlying cloud and are using nested virt. its also semi resonable for nested vms but its not a good choice for the host. you should either use host-passthough and segreate your host using aggreates or other means to ensure live migration capavlity or use a custom model. host model is a good default provided you upgrade all host at the same time and you are ok with the feature set changing. host model has a 1 way migration proablem where it possible to migrate form old host to new but not new to old if the vm is hard rebooted in between. so when using host model we still recommend segrationg host by cpu generation to avoid that. > > Br, > Luis Rmz > Blockchain, DevOps & Open Source Cloud Solutions Architect > ---------------------------------------- > Founder & CEO > OpenCloud.es > luis.ramirez at opencloud.es > Skype ID: d.overload > Hangouts: luis.ramirez at opencloud.es > [image: ] +34 911 950 123 / [image: ]+39 392 1289553 / [image: ]+49 152 > 26917722 / Česká republika: +420 774 274 882 > ----------------------------------------------------- > > > El mar., 18 ago. 2020 a las 16:55, Belmiro Moreira (< > moreira.belmiro.email.lists at gmail.com>) escribió: > > > Hi, > > in our infrastructure we have always compute nodes that need a hardware > > intervention and as a consequence they are rebooted, bringing a new kernel, > > kvm, ... > > > > In order to have a good compromise between performance and flexibility > > (live migration) we have been using "host-model" for the "cpu_mode" > > configuration of our service VMs. We didn't expect to have CPU > > compatibility issues because we have the same hardware type per cell. > > > > The problem is that when a compute node is rebooted the instance domain is > > recreated with the new cpu features that were introduced because of the > > reboot (using centOS). > > > > If there are new CPU features exposed, this basically blocks live > > migration to all the non rebooted compute nodes (those cpu features are not > > exposed, yet). The nova-scheduler doesn't know about them when scheduling > > the live migration destination. > > > > I wonder how other operators are solving this issue. > > I don't like stopping OS upgrades. > > What I'm considering is to define a "custom" cpu_mode for each hardware > > type. > > > > I would appreciate your comments and learn how you are solving this > > problem. > > > > Belmiro > > > > From smooney at redhat.com Tue Aug 18 15:16:17 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 18 Aug 2020 16:16:17 +0100 Subject: [nova][ops] Live migration and CPU features In-Reply-To: References: Message-ID: <44347504ff7308a6c3b4155060c778fad368a002.camel@redhat.com> On Tue, 2020-08-18 at 17:06 +0200, Fabian Zimmermann wrote: > Hi, > > We are using the "custom"-way. But this does not protect you from all issues. > > We had problems with a new cpu-generation not (jet) detected correctly > in an libvirt-version. So libvirt failed back to the "desktop"-cpu of > this newer generation, but didnt support/detect some features => > blocked live-migration. yes that is common when using really new hardware. having previouly worked at intel and hitting this often that one of the reason i tend to default to host-passthouh and recommend using AZ or aggreate to segreatate the cloud for live migration. in the case where your libvirt does not know about the new cpus your best approch is to use the newest server cpu model that it know about and then if you really need the new fature you can try to add theem using the config options but that is effectivly the same as using host-passhtough which is why i default to that as a workaround instead. > > Fabian > > Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira > : > > > > Hi, > > in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are > > rebooted, bringing a new kernel, kvm, ... > > > > In order to have a good compromise between performance and flexibility (live migration) we have been using "host- > > model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues > > because we have the same hardware type per cell. > > > > The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that > > were introduced because of the reboot (using centOS). > > > > If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes > > (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live > > migration destination. > > > > I wonder how other operators are solving this issue. > > I don't like stopping OS upgrades. > > What I'm considering is to define a "custom" cpu_mode for each hardware type. > > > > I would appreciate your comments and learn how you are solving this problem. > > > > Belmiro > > > > From fungi at yuggoth.org Tue Aug 18 15:24:14 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 18 Aug 2020 15:24:14 +0000 Subject: Can't fetch from opendev. In-Reply-To: <6590e740-00f1-ee60-ac00-5872039e0cb0@redhat.com> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> <6590e740-00f1-ee60-ac00-5872039e0cb0@redhat.com> Message-ID: <20200818152414.s5srmotngy7a7w7r@yuggoth.org> On 2020-08-18 12:19:35 +0200 (+0200), Daniel Bengtsson wrote: [...] > I try only to do a fetch on this remote. I have no explicit error. > But the fetch blocks indefinitely. [...] Thanks, that's an important detail. So you're running this: git fetch https://opendev.org/openstack/tripleo-heat-templates and it just hangs indefinitely and never returns an error? This makes me suspect a routing problem. I've seen it most often when users have broken IPv6 routing locally. If you're using Git 2.16 or later, it provides the option of specifying IPv4 or IPv6 on the command line. To test this, add a -4 after the "fetch" like: git fetch -4 https://opendev.org/openstack/tripleo-heat-templates One reason I suspect this might be the problem is that GitHub is IPv4-only, so if you have something black-holing or blocking traffic for global IPv6 routes, then that could cause the behavior you're observing. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From emilien at redhat.com Tue Aug 18 16:38:03 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 18 Aug 2020 12:38:03 -0400 Subject: [tripleo] Help needed to write a Third Party integration guide Message-ID: Hi people, We have been requested several times (for good reasons) about how to write out-of-tree files for TripleO integration (Heat templates, Ansible roles, Container Images layouts, etc). For example, Dell has an external repository ( https://github.com/dell/tripleo-powerflex) where they have pretty much all they need to install their services in out-of-tree fashion (I'm sure there are more examples, this one is just the most recent in my knowledge). This model is recommended for the third party services that aren't part of TripleO but still want to be integrated with it. This usually fits when the service can't be maintained by the TripleO team but there is a desire from outside of the community to maintain some integration (e.g. vendors). We haven't done a good job at providing a full end to end guide on how to achieve this and very often asked people to just do it. I propose that we work on this guide together and today I'm gathering for volunteers who have knowledge on that field or are interested to learn about it and contribute it back directly into a new guide, hosted on tripleo-docs repo. https://bugs.launchpad.net/tripleo/+bug/1892072 This will probably involve a bunch of linking to existing docs but also a good opportunity to update what is outdated in our content and provide more information where needed. Thanks for letting us know if you're interested to be actively contributing into that effort, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Aug 18 16:43:54 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 18 Aug 2020 09:43:54 -0700 Subject: GHC Mentors Needed for OpenStack In-Reply-To: References: Message-ID: Looks like applications are closed already? -Kendall (diablo_rojo) On Tue, Aug 18, 2020 at 6:16 AM Amy Marrich wrote: > Grace Hopper Conference is going virtual this year and once again > OpenStack is participating as one of the Open Source Day projects. We are > hoping to do some peer programming (aka mentees shadowing folks while they > work through a patch) as part of the day. Mentors receive a full > conference pass and AnitaB.org membership. Please check out the > requirements > {0} > and apply > (1) > by August 19, 2020. > > We are also figuring a way for more folks to be able to mentor, so if > you'd like to help but aren't interested in the conference please reach out > to me or Victoria(vkmc) by email or on IRC. > > Thanks and apologies for the short deadline though I can probably get let > additions in:) > > Amy (spotz) > > 0- Grace Hopper mentorship requirements: > https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 > > 1- Grace Hopper mentorship application: > https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Aug 18 16:55:55 2020 From: amy at demarco.com (Amy Marrich) Date: Tue, 18 Aug 2020 11:55:55 -0500 Subject: GHC Mentors Needed for OpenStack In-Reply-To: References: Message-ID: We're working on being allowed to give them a list as the date we originally had changed. Amy (spotz) On Tue, Aug 18, 2020 at 11:44 AM Kendall Nelson wrote: > Looks like applications are closed already? > > -Kendall (diablo_rojo) > > On Tue, Aug 18, 2020 at 6:16 AM Amy Marrich wrote: > >> Grace Hopper Conference is going virtual this year and once again >> OpenStack is participating as one of the Open Source Day projects. We are >> hoping to do some peer programming (aka mentees shadowing folks while they >> work through a patch) as part of the day. Mentors receive a full >> conference pass and AnitaB.org membership. Please check out the >> requirements >> {0} >> and apply >> (1) >> by August 19, 2020. >> >> We are also figuring a way for more folks to be able to mentor, so if >> you'd like to help but aren't interested in the conference please reach out >> to me or Victoria(vkmc) by email or on IRC. >> >> Thanks and apologies for the short deadline though I can probably get let >> additions in:) >> >> Amy (spotz) >> >> 0- Grace Hopper mentorship requirements: >> https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 >> >> 1- Grace Hopper mentorship application: >> https://ghc.anitab.org/get-involved/volunteer/committee-members-and-scholarship-reviewers-2 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Tue Aug 18 16:58:29 2020 From: amy at demarco.com (Amy Marrich) Date: Tue, 18 Aug 2020 11:58:29 -0500 Subject: [tripleo] Help needed to write a Third Party integration guide In-Reply-To: References: Message-ID: Emilien, I can definitely help with QA and editing of this if not the actual writing. I'm in the process of making an up to date how to install a virtual cluster, so have been doing some basic installs over and over but nothing more advanced. Thanks, Amy (spotz) On Tue, Aug 18, 2020 at 11:41 AM Emilien Macchi wrote: > Hi people, > > We have been requested several times (for good reasons) about how to write > out-of-tree files for TripleO integration (Heat templates, Ansible roles, > Container Images layouts, etc). > For example, Dell has an external repository ( > https://github.com/dell/tripleo-powerflex) where they have pretty much > all they need to install their services in out-of-tree fashion (I'm sure > there are more examples, this one is just the most recent in my knowledge). > This model is recommended for the third party services that aren't part of > TripleO but still want to be integrated with it. > This usually fits when the service can't be maintained by the TripleO team > but there is a desire from outside of the community to maintain some > integration (e.g. vendors). > > We haven't done a good job at providing a full end to end guide on how to > achieve this and very often asked people to just do it. I propose that we > work on this guide together and today I'm gathering for volunteers who have > knowledge on that field or are interested to learn about it and contribute > it back directly into a new guide, hosted on tripleo-docs repo. > > https://bugs.launchpad.net/tripleo/+bug/1892072 > > This will probably involve a bunch of linking to existing docs but also a > good opportunity to update what is outdated in our content and provide more > information where needed. > > Thanks for letting us know if you're interested to be actively > contributing into that effort, > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig at stackhpc.com Tue Aug 18 20:51:11 2020 From: stig at stackhpc.com (Stig Telfer) Date: Tue, 18 Aug 2020 21:51:11 +0100 Subject: [monasca] Setup Monasca from scratch In-Reply-To: <5e457dae-dc7c-3693-dc34-e622c2cd40f8@cloud.ionos.com> References: <5e457dae-dc7c-3693-dc34-e622c2cd40f8@cloud.ionos.com> Message-ID: Hey Antonios - This will depend a good deal on the method you're using for deploying OpenStack. I've used the Kolla-Ansible documentation for Monasca [https://docs.openstack.org/kolla-ansible/ussuri/reference/logging-and-monitoring/monasca-guide.html] and found it helpful for getting started. I'm sure there are other guides out there too. If you get stuck I also recommend trying the #openstack-monasca IRC channel. Cheers, Stig > On 18 Aug 2020, at 09:24, Antonios Dimtsoudis wrote: > > Hi all, > > i am trying to set up Monasca from scratch. Is there a good introduction / point to start of you would recommend? > > Thanks in advance, > > Antonios. > > From stig at stackhpc.com Tue Aug 18 20:55:13 2020 From: stig at stackhpc.com (Stig Telfer) Date: Tue, 18 Aug 2020 21:55:13 +0100 Subject: [scientific-sig] IRC meeting starting shortly Message-ID: Hi All - We have a Scientific SIG IRC meeting starting shortly in channel #openstack-meeting. Everyone is welcome. This week's agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_August_18th_2020 We'd like to cover some upcoming (virtual) events - Supercomputing 2020 and the OpenStack PTG. Cheers, Stig From melwittt at gmail.com Tue Aug 18 21:10:40 2020 From: melwittt at gmail.com (melanie witt) Date: Tue, 18 Aug 2020 14:10:40 -0700 Subject: [neutron][gate] verbose q-svc log files and e-r indexing In-Reply-To: <20200818150052.u4xkjsptejikwcny@skaplons-mac> References: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> <20200818150052.u4xkjsptejikwcny@skaplons-mac> Message-ID: <62e4fcd2-0f7a-a7d3-7692-3ad9a05c8399@gmail.com> On 8/18/20 08:00, Slawek Kaplonski wrote: > Hi, > > I proposed patch [1] which seems that decreased size of the neutron-server log > a bit - see [2] but it's still about 40M :/ > > [1] https://review.opendev.org/#/c/730879/ > [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcdn.com/746714/1/check/tempest-full-py3/5c1399c/controller/logs/ Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!). With your patch applied, the most frequent potential log message I see now is like this: Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}} with the line count difference being with and without: $ wc -l "screen-q-svc.txt" 102493 screen-q-svc.txt $ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l 83261 so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output. But I'm not sure whether that's a critical debugging element or not. -melanie From christophe.sauthier at objectif-libre.com Tue Aug 18 21:20:39 2020 From: christophe.sauthier at objectif-libre.com (Christophe Sauthier) Date: Tue, 18 Aug 2020 17:20:39 -0400 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Hello everyone Sorry it took me a few days to answer that thread. First of all I am REALLY REALLY happy to see that a few persones are stepping up to continue to work on Cloudkitty. The situation is, like usually, a chaining of events (and honestly Thomas it is absolutely not related to the sale of Objectif Libre by Linkbynet). In late 2019 we tried to push hard to organize a community around Cloudkitty. We have tried to organise a few call with some users explaining them the next challenges that the project will be facing and how we could all work on that. Like it is the case for many projects we had little/no feedback... By early 2020 we had some turn over in the company (once again not related to the sale) and we have started to organise ourself to continue our ongoing on CLoudkitty like we are doing since the beginning of the project, that I have started some years ago... And then the COVID crisis arrived, and like many compagny in the world we had to change our priorities... During the end of summer (before holidays..) we started to organize again internally to continue that work. So it is a great news that a community is rising, and we will be really happy to work with the rest of it to continue to improve Cloudkitty, especially since like Thomas said "It does the job" :) Christophe On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez wrote: > Thomas Goirand wrote: > > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: > >> Thanks, Pierre for helping with this. > >> > >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) < > justin.ferrieu at objectif-libre.com>) > >> but I am not sure if he got any response back. > > No response so far, but they may all be in company summer vacation. > > > The end of the very good maintenance of Cloudkitty matched the date when > > objectif libre was sold to Linkbynet. Maybe the new owner don't care > enough? > > > > This is very disappointing as I've been using it for some time already, > > and that I was satisfied by it (ie: it does the job...), and especially > > that latest releases are able to scale correctly. > > > > I very much would love if Pierre Riteau was successful in taking over. > > Good luck Pierre! I'll try to help whenever I can and if I'm not too > busy. > > Given the volunteers (Pierre, Rafael, Luis) I would support the TC using > its unholy powers to add extra core reviewers to cloudkitty. > > If the current PTL comes back, I'm sure they will appreciate the help, > and can always fix/revert things before Victoria release. > > -- > Thierry Carrez (ttx) > > -- ---- Christophe Sauthier Directeur Général Objectif Libre : Au service de votre Cloud +33 (0) 6 16 98 63 96 | christophe.sauthier at objectif-libre.com https://www.objectif-libre.com | @objectiflibre Recevez la Pause Cloud Et DevOps : https://olib.re/abo-pause -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwienand at redhat.com Tue Aug 18 23:52:47 2020 From: iwienand at redhat.com (Ian Wienand) Date: Wed, 19 Aug 2020 09:52:47 +1000 Subject: [simplification] Making ask.openstack.org read-only In-Reply-To: References: Message-ID: <20200818235247.GA341779@fedora19.localdomain> On Tue, Aug 18, 2020 at 12:44:43PM +0200, Thierry Carrez wrote: > I think it's time to pull the plug, make ask.openstack.org read-only (so > that links to old answers are not lost) and redirect users to the > mailing-list and the "OpenStack" tag on StackOverflow. I picked > StackOverflow since it seems to have the most openstack questions (2,574 on > SO, 76 on SuperUser and 430 on ServerFault). I agree that this is the most pragmatic approach. > Thoughts, comments? *If* we were to restore it now, it looks like 0.11 branch comes with an upstream Dockerfile [1]; there's lots of examples now in system-config of similar container-based production sites and this could fit in. This makes it significantly easier than trying to build up everything it requires from scratch, and if upstream keep their container compatible (a big if...) theoretically less work to keep updated. But despite the self-hosting story being better in 2020, I agree the ROI isn't there. -i [1] https://github.com/ASKBOT/askbot-devel/blob/0.11.x/Dockerfile From fungi at yuggoth.org Wed Aug 19 00:03:59 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 19 Aug 2020 00:03:59 +0000 Subject: [simplification] Making ask.openstack.org read-only In-Reply-To: <20200818235247.GA341779@fedora19.localdomain> References: <20200818235247.GA341779@fedora19.localdomain> Message-ID: <20200819000359.mhz43jvop5vtcgct@yuggoth.org> On 2020-08-19 09:52:47 +1000 (+1000), Ian Wienand wrote: [...] > *If* we were to restore it now, it looks like 0.11 branch comes with > an upstream Dockerfile [1]; there's lots of examples now in > system-config of similar container-based production sites and this > could fit in. > > This makes it significantly easier than trying to build up everything > it requires from scratch, and if upstream keep their container > compatible (a big if...) theoretically less work to keep updated. [...] Which also brings up another point: right now we're running it on Ubuntu Xenial (16.04 LTS) which is scheduled to reach EOL early next year, and the tooling we're using to deploy it isn't going to work on newer Ubuntu releases. Even keeping it up in a read-only state is timebound to how long we can safely keep its server online. If we switch ask.openstack.org to read-only now, I would still plan to turn it off entirely on or before April 1, 2021. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From johnsomor at gmail.com Wed Aug 19 00:35:05 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 18 Aug 2020 17:35:05 -0700 Subject: [simplification] Making ask.openstack.org read-only In-Reply-To: <20200819000359.mhz43jvop5vtcgct@yuggoth.org> References: <20200818235247.GA341779@fedora19.localdomain> <20200819000359.mhz43jvop5vtcgct@yuggoth.org> Message-ID: Yes! ask.openstack.org is no fun to attempt to be helpful on (see e-mail notification issues, etc.). I would like to ask that we put together some sort of guide and/or guidence for how to use stack overflow efficiently for OpenStack questions. I.e. some well known or defined tags that we recommend people use when asking questions. This would be similar to the tags we use for the openstack discuss list. I see that there is already a trend for "openstack-nova" "openstack-horizon", etc. This works for me. This way we can setup notifications for these tags and be much more efficient at getting people answers. Thanks Thierry for moving this forward! Michael On Tue, Aug 18, 2020 at 5:10 PM Jeremy Stanley wrote: > > On 2020-08-19 09:52:47 +1000 (+1000), Ian Wienand wrote: > [...] > > *If* we were to restore it now, it looks like 0.11 branch comes with > > an upstream Dockerfile [1]; there's lots of examples now in > > system-config of similar container-based production sites and this > > could fit in. > > > > This makes it significantly easier than trying to build up everything > > it requires from scratch, and if upstream keep their container > > compatible (a big if...) theoretically less work to keep updated. > [...] > > Which also brings up another point: right now we're running it on > Ubuntu Xenial (16.04 LTS) which is scheduled to reach EOL early next > year, and the tooling we're using to deploy it isn't going to work > on newer Ubuntu releases. Even keeping it up in a read-only state is > timebound to how long we can safely keep its server online. If we > switch ask.openstack.org to read-only now, I would still plan to > turn it off entirely on or before April 1, 2021. > -- > Jeremy Stanley From sam47priya at gmail.com Wed Aug 19 01:45:59 2020 From: sam47priya at gmail.com (Sam P) Date: Wed, 19 Aug 2020 10:45:59 +0900 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: Hi All, In past few weeks I was not able to manage time to properly maintain the project. Really sorry for that. If you would like to help out, I will add you as core member to project and we can discuss how to proceed. If there are no objections, I will add the following members to the core team. suzhengwei Jegor van Opdorp Radosław Piliszek --- Regards, Sampath On Mon, Aug 17, 2020 at 11:13 PM Jegor van Opdorp wrote: > > We're also using masakari and willing to help maintain it! > ________________________________ > From: Mark Goddard > Sent: Monday, August 17, 2020 12:12 PM > To: Jegor van Opdorp > Subject: Fwd: [tc][masakari] Project aliveness (was: [masakari] Meetings) > > ---------- Forwarded message --------- > From: Radosław Piliszek > Date: Fri, 14 Aug 2020 at 08:53 > Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) > To: openstack-discuss > Cc: Sampath Priyankara (samP) , Tushar Patil > (tpatil) > > > Hi, > > it's been a month since I wrote the original (quoted) email, so I > retry it with CC to the PTL and a recently (this year) active core. > > I see there have been no meetings and neither Masakari IRC channel nor > review queues have been getting much attention during that time > period. > I am, therefore, offering my help to maintain the project. > > Regarding the original topic, I would opt for running Masakari > meetings during the time I proposed so that interested parties could > join and I know there is at least some interest based on recent IRC > activity (i.e. there exist people who want to use and discuss Masakari > - apart from me that is :-) ). > > -yoctozepto > > > On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek > wrote: > > > > Hello Fellow cloud-HA-seekers, > > > > I wanted to attend Masakari meetings but I found the current schedule unfit. > > Is there a chance to change the schedule? The day is fine but a shift > > by +3 hours would be nice. > > > > Anyhow, I wanted to discuss [1]. I've already proposed a change > > implementing it and looking forward to positive reviews. :-) That > > said, please reply on the change directly, or mail me or catch me on > > IRC, whichever option sounds best to you. > > > > [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > > > -yoctozepto From mnaser at vexxhost.com Wed Aug 19 02:30:08 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 18 Aug 2020 22:30:08 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: <6613245.ccrTHCtBl7@antares> Message-ID: On Tue, Aug 18, 2020 at 10:53 AM Assaf Muller wrote: > > On Tue, Aug 18, 2020 at 8:12 AM Jonas Schäfer > wrote: > > > > Hi Mohammed and all, > > > > On Montag, 17. August 2020 14:01:55 CEST Mohammed Naser wrote: > > > Over the past few days, we were troubleshooting an issue that ended up > > > having a root cause where keepalived has somehow ended up active in > > > two different L3 agents. We've yet to find the root cause of how this > > > happened but removing it and adding it resolved the issue for us. > > > > We’ve also seen that behaviour occasionally. The root cause is also unclear > > for us (so we would’ve love to hear about that). > > Insert shameless plug for the Neutron OVN backend. One of it's > advantages is that it's L3 HA architecture is cleaner and more > scalable (this is coming from the dude that wrote the L3 HA code we're > all suffering from =D). The ML2/OVS L3 HA architecture has it's issues > - I've seen it work at 100's of customer sites at scale, so I don't > want to knock it too much, but just a day ago I got an internal > customer ticket about keepalived falling over on a particular router > that has 200 floating IPs. It works but it's not perfect. I'm sure the > OVN implementation isn't either but it's simply cleaner and has less > moving parts. It uses BFD to monitor the tunnel endpoints, so failover > is faster too. Plus, it doesn't use keepalived. > OVN is something we're looking at and we're very excited about, unfortunately, there seems to be a bunch of gaps in documentation right now as well as a lot of the migration scripts to OVN are TripleO-y. So it'll take time to get us there, but yes, OVN simplifies this greatly > > We have anecdotal evidence > > that a rabbitmq failure was involved, although that makes no sense to me > > personally. Other causes may be incorrectly cleaned-up namespaces (for > > example, when you kill or hard-restart the l3 agent, the namespaces will stay > > around, possibly with the IP address assigned; the keepalived on the other l3 > > agents will not see the VRRP advertisments anymore and will ALSO assign the IP > > address. This will also be rectified by a restart always and may require > > manual namespace cleanup with a tool, a node reboot or an agent disable/enable > > cycle.). > > > > > As we work on improving our monitoring, we wanted to implement > > > something that gets us the info of # of active routers to check if > > > there's a router that has >1 active L3 agent but it's hard because > > > hitting the /l3-agents endpoint on _every_ single router hurts a lot > > > on performance. > > > > > > Is there something else that we can watch which might be more > > > productive? FYI -- this all goes in the open and will end up inside > > > the openstack-exporter: > > > https://github.com/openstack-exporter/openstack-exporter and the Helm > > > charts will end up with the alerts: > > > https://github.com/openstack-exporter/helm-charts > > > > While I don’t think it fits in your openstack-exporter design, we are > > currently using the attached script (which we also hereby publish under the > > terms of the Apache 2.0 license [1]). (Sorry, I lack the time to cleanly > > publish it somewhere right now.) > > > > It checks the state files maintained by the L3 agent conglomerate and exports > > metrics about the master-ness of the routers as prometheus metrics. > > > > Note that this is slightly dangerous since the router IDs are high-cardinality > > and using that as a label value in Prometheus is discouraged; you may not want > > to do this in a public cloud setting. > > > > Either way: This allows us to alert on routers where there is not exactly one > > master state. Downside is that this requires the thing to run locally on the > > l3 agent nodes. Upside is that it is very efficient, and will also show the > > master state in some cases where the router was not cleaned up properly (e.g. > > because the l3 agent and its keepaliveds were killed). > > kind regards, > > Jonas > > > > [1]: http://www.apache.org/licenses/LICENSE-2.0 > > -- > > Jonas Schäfer > > DevOps Engineer > > > > Cloud&Heat Technologies GmbH > > Königsbrücker Straße 96 | 01099 Dresden > > +49 351 479 367 37 > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > New Service: > > Managed Kubernetes designed for AI & ML > > https://managed-kubernetes.cloudandheat.com/ > > > > Commercial Register: District Court Dresden > > Register Number: HRB 30549 > > VAT ID No.: DE281093504 > > Managing Director: Nicolas Röhrs > > Authorized signatory: Dr. Marius Feldmann > > Authorized signatory: Kristina Rübenkamp > > -- Mohammed Naser VEXXHOST, Inc. From mnaser at vexxhost.com Wed Aug 19 02:31:17 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 18 Aug 2020 22:31:17 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: <6613245.ccrTHCtBl7@antares> References: <6613245.ccrTHCtBl7@antares> Message-ID: On Tue, Aug 18, 2020 at 8:12 AM Jonas Schäfer wrote: > > Hi Mohammed and all, > > On Montag, 17. August 2020 14:01:55 CEST Mohammed Naser wrote: > > Over the past few days, we were troubleshooting an issue that ended up > > having a root cause where keepalived has somehow ended up active in > > two different L3 agents. We've yet to find the root cause of how this > > happened but removing it and adding it resolved the issue for us. > > We’ve also seen that behaviour occasionally. The root cause is also unclear > for us (so we would’ve love to hear about that). We have anecdotal evidence > that a rabbitmq failure was involved, although that makes no sense to me > personally. Other causes may be incorrectly cleaned-up namespaces (for > example, when you kill or hard-restart the l3 agent, the namespaces will stay > around, possibly with the IP address assigned; the keepalived on the other l3 > agents will not see the VRRP advertisments anymore and will ALSO assign the IP > address. This will also be rectified by a restart always and may require > manual namespace cleanup with a tool, a node reboot or an agent disable/enable > cycle.). > > > As we work on improving our monitoring, we wanted to implement > > something that gets us the info of # of active routers to check if > > there's a router that has >1 active L3 agent but it's hard because > > hitting the /l3-agents endpoint on _every_ single router hurts a lot > > on performance. > > > > Is there something else that we can watch which might be more > > productive? FYI -- this all goes in the open and will end up inside > > the openstack-exporter: > > https://github.com/openstack-exporter/openstack-exporter and the Helm > > charts will end up with the alerts: > > https://github.com/openstack-exporter/helm-charts > > While I don’t think it fits in your openstack-exporter design, we are > currently using the attached script (which we also hereby publish under the > terms of the Apache 2.0 license [1]). (Sorry, I lack the time to cleanly > publish it somewhere right now.) > > It checks the state files maintained by the L3 agent conglomerate and exports > metrics about the master-ness of the routers as prometheus metrics. > > Note that this is slightly dangerous since the router IDs are high-cardinality > and using that as a label value in Prometheus is discouraged; you may not want > to do this in a public cloud setting. > > Either way: This allows us to alert on routers where there is not exactly one > master state. Downside is that this requires the thing to run locally on the > l3 agent nodes. Upside is that it is very efficient, and will also show the > master state in some cases where the router was not cleaned up properly (e.g. > because the l3 agent and its keepaliveds were killed). This seems sweet. Let me go over the code. I might package this up into something consumable and host it inside OpenDev, if that's okay with you? > kind regards, > Jonas > > [1]: http://www.apache.org/licenses/LICENSE-2.0 > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp -- Mohammed Naser VEXXHOST, Inc. From dev.faz at gmail.com Wed Aug 19 04:23:38 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 19 Aug 2020 06:23:38 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: Hi, if nobody complains I also would like to request core status to help getting the project further. Fabian Zimmermann Sam P schrieb am Mi., 19. Aug. 2020, 03:50: > Hi All, > In past few weeks I was not able to manage time to properly maintain > the project. > Really sorry for that. If you would like to help out, I will add you > as core member to project and we can discuss how to proceed. > > If there are no objections, I will add the following members to the core > team. > suzhengwei > Jegor van Opdorp > Radosław Piliszek > > --- Regards, > Sampath > > On Mon, Aug 17, 2020 at 11:13 PM Jegor van Opdorp > wrote: > > > > We're also using masakari and willing to help maintain it! > > ________________________________ > > From: Mark Goddard > > Sent: Monday, August 17, 2020 12:12 PM > > To: Jegor van Opdorp > > Subject: Fwd: [tc][masakari] Project aliveness (was: [masakari] Meetings) > > > > ---------- Forwarded message --------- > > From: Radosław Piliszek > > Date: Fri, 14 Aug 2020 at 08:53 > > Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) > > To: openstack-discuss > > Cc: Sampath Priyankara (samP) , Tushar Patil > > (tpatil) > > > > > > Hi, > > > > it's been a month since I wrote the original (quoted) email, so I > > retry it with CC to the PTL and a recently (this year) active core. > > > > I see there have been no meetings and neither Masakari IRC channel nor > > review queues have been getting much attention during that time > > period. > > I am, therefore, offering my help to maintain the project. > > > > Regarding the original topic, I would opt for running Masakari > > meetings during the time I proposed so that interested parties could > > join and I know there is at least some interest based on recent IRC > > activity (i.e. there exist people who want to use and discuss Masakari > > - apart from me that is :-) ). > > > > -yoctozepto > > > > > > On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek > > wrote: > > > > > > Hello Fellow cloud-HA-seekers, > > > > > > I wanted to attend Masakari meetings but I found the current schedule > unfit. > > > Is there a chance to change the schedule? The day is fine but a shift > > > by +3 hours would be nice. > > > > > > Anyhow, I wanted to discuss [1]. I've already proposed a change > > > implementing it and looking forward to positive reviews. :-) That > > > said, please reply on the change directly, or mail me or catch me on > > > IRC, whichever option sounds best to you. > > > > > > [1] > https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > > > > > -yoctozepto > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reza.b2008 at gmail.com Wed Aug 19 05:42:20 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Wed, 19 Aug 2020 10:12:20 +0430 Subject: VM doesn't have internet - OpenStack Ussuri with OVN networking In-Reply-To: References: Message-ID: The problem was solved. It was due to the underlying macvtap bridge. On Sat, 15 Aug 2020 at 17:38, Reza Bakhshayeshi wrote: > Hi all, > > I've set up OpenStack Ussuri with OVN networking manually, VMs can ping > each other through an internal network. I've created a provider network > with valid IP subnet, and my problem is VMs don't have internet access > before and after assigning floating IP. > I've encountered the same problem on TripleO (with dvr), and I just wanted > to investigate the problem by manual installation (without HA and DVR), but > the same happened. > Everything seems working properly, I can't see any error in logs, here is > agent list output: > > [root at controller ~]# openstack network agent list > > +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ > | ID | Agent Type | > Host | Availability Zone | Alive | State | Binary > | > > +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ > | 1ade76ae-6caf-4942-8df3-e3bc39d2f12d | OVN Controller Gateway agent | > controller.localdomain | n/a | :-) | UP | ovn-controller > | > | 484f123f-5935-44ce-aee7-4102271d9f11 | OVN Controller agent | > compute.localdomain | n/a | :-) | UP | ovn-controller > | > | 01235c13-4f32-4c4f-8cf6-e4b8d59a438a | OVN Metadata agent | > compute.localdomain | n/a | :-) | UP | > networking-ovn-metadata-agent | > > +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+-------------------------------+ > > On the controller I got br-ex with a valid IP address. here is the > external-ids table on controller and compute node: > > [root at controller ~]# ovs-vsctl get Open_vSwitch . external-ids > {hostname=controller.localdomain, ovn-bridge=br-int, > ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.0.0.11", > ovn-encap-type=geneve, ovn-remote="tcp:10.0.0.11:6642", > rundir="/var/run/openvswitch", > system-id="1ade76ae-6caf-4942-8df3-e3bc39d2f12d"} > > [root at compute ~]# ovs-vsctl get Open_vSwitch . external-ids > {hostname=compute.localdomain, ovn-bridge=br-int, > ovn-encap-ip="10.0.0.31", ovn-encap-type=geneve, ovn-remote="tcp: > 10.0.0.11:6642", rundir="/var/run/openvswitch", > system-id="484f123f-5935-44ce-aee7-4102271d9f11"} > > and I have: > > [root at controller ~]# ovn-nbctl show > switch 72fd5c08-6852-4d7e-b9b4-7e0a1ccdd976 > (neutron-b8c66c3d-f47a-42a5-bd2d-c40c435c0376) (aka net01) > port cf99f43b-0a18-4b91-9ca5-b6ed3f86d994 > type: localport > addresses: ["fa:16:3e:d0:df:82 192.168.0.100"] > port 4268f511-bee3-4da0-8835-b9a8664101c4 > addresses: ["fa:16:3e:35:f2:02 192.168.0.135"] > port 846919e8-cde5-4ba3-b003-0c06e73676ed > type: router > router-port: lrp-846919e8-cde5-4ba3-b003-0c06e73676ed > switch bb22224e-e1d1-4bb2-b57e-1058e9fc33a7 > (neutron-9614546f-b216-4554-9bfe-e8d6bb11d927) (aka provider) > port 2f05c7bc-ad0f-4a41-bbd8-5fef1f5bfd2c > type: localport > addresses: ["fa:16:3e:17:7b:5b X.X.X.X"] > port provnet-9614546f-b216-4554-9bfe-e8d6bb11d927 > type: localnet > addresses: ["unknown"] > port 23fcdc9d-2d11-40c9-881e-c78e871a3314 > type: router > router-port: lrp-23fcdc9d-2d11-40c9-881e-c78e871a3314 > router 0bd35585-b0a3-4c8f-b71b-cb87c9fad060 > (neutron-8cdcd0d2-752c-4130-87bb-d2b7af803ec9) (aka router01) > port lrp-846919e8-cde5-4ba3-b003-0c06e73676ed > mac: "fa:16:3e:4d:c3:f9" > networks: ["192.168.0.1/24"] > port lrp-23fcdc9d-2d11-40c9-881e-c78e871a3314 > mac: "fa:16:3e:94:89:8e" > networks: ["X.X.X.X/22"] > gateway chassis: [1ade76ae-6caf-4942-8df3-e3bc39d2f12d > 484f123f-5935-44ce-aee7-4102271d9f11] > nat 8ef6167a-bc28-4caf-8af5-d0bf12a62545 > external ip: " X.X.X.X " > logical ip: "192.168.0.135" > type: "dnat_and_snat" > nat ba32ab93-3d2b-4199-b634-802f0f438338 > external ip: " X.X.X.X " > logical ip: "192.168.0.0/24" > type: "snat" > > I replaced valid IPs with X.X.X.X > > Any suggestion would be grateful. > Regards, > Reza > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.schaefer at cloudandheat.com Wed Aug 19 05:58:16 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Wed, 19 Aug 2020 07:58:16 +0200 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: <6613245.ccrTHCtBl7@antares> Message-ID: <5669200.AjuPLuGbex@antares> On Mittwoch, 19. August 2020 04:31:17 CEST you wrote: > This seems sweet. Let me go over the code. I might package this up > into something > consumable and host it inside OpenDev, if that's okay with you? Yes sure. I would’ve proposed it for x/osops-tools-contrib myself, but unfortunately I’m very short on time to work on this right now. So thanks for taking this on. kind regards, -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From radoslaw.piliszek at gmail.com Wed Aug 19 07:35:22 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 19 Aug 2020 09:35:22 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: Hello Sampath, I'm really glad you are doing well! Also, thanks for approving the Train release. :-) Please let me know how we should proceed with the meetings. I can start them on Tuesdays at 7 AM UTC. And since the Masakari own channel is quite a peaceful one, I would suggest to run them there directly. What are your thoughts? :-) Kind regards, -yoctozepto On Wed, Aug 19, 2020 at 3:49 AM Sam P wrote: > > Hi All, > In past few weeks I was not able to manage time to properly maintain > the project. > Really sorry for that. If you would like to help out, I will add you > as core member to project and we can discuss how to proceed. > > If there are no objections, I will add the following members to the core team. > suzhengwei > Jegor van Opdorp > Radosław Piliszek > > --- Regards, > Sampath > > On Mon, Aug 17, 2020 at 11:13 PM Jegor van Opdorp wrote: > > > > We're also using masakari and willing to help maintain it! > > ________________________________ > > From: Mark Goddard > > Sent: Monday, August 17, 2020 12:12 PM > > To: Jegor van Opdorp > > Subject: Fwd: [tc][masakari] Project aliveness (was: [masakari] Meetings) > > > > ---------- Forwarded message --------- > > From: Radosław Piliszek > > Date: Fri, 14 Aug 2020 at 08:53 > > Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) > > To: openstack-discuss > > Cc: Sampath Priyankara (samP) , Tushar Patil > > (tpatil) > > > > > > Hi, > > > > it's been a month since I wrote the original (quoted) email, so I > > retry it with CC to the PTL and a recently (this year) active core. > > > > I see there have been no meetings and neither Masakari IRC channel nor > > review queues have been getting much attention during that time > > period. > > I am, therefore, offering my help to maintain the project. > > > > Regarding the original topic, I would opt for running Masakari > > meetings during the time I proposed so that interested parties could > > join and I know there is at least some interest based on recent IRC > > activity (i.e. there exist people who want to use and discuss Masakari > > - apart from me that is :-) ). > > > > -yoctozepto > > > > > > On Mon, Jul 13, 2020 at 9:53 PM Radosław Piliszek > > wrote: > > > > > > Hello Fellow cloud-HA-seekers, > > > > > > I wanted to attend Masakari meetings but I found the current schedule unfit. > > > Is there a chance to change the schedule? The day is fine but a shift > > > by +3 hours would be nice. > > > > > > Anyhow, I wanted to discuss [1]. I've already proposed a change > > > implementing it and looking forward to positive reviews. :-) That > > > said, please reply on the change directly, or mail me or catch me on > > > IRC, whichever option sounds best to you. > > > > > > [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key > > > > > > -yoctozepto > From hemant.sonawane at itera.io Wed Aug 19 07:52:40 2020 From: hemant.sonawane at itera.io (Hemant Sonawane) Date: Wed, 19 Aug 2020 09:52:40 +0200 Subject: [openstack-helm] openstack-helm-images release stable/ussuri loci images building issue Message-ID: Hello, I am trying to build a *loci openstack-helm-images release stable/ussuri *but there is an issue with each image I am building which could be related to the python version and pip package but I am really not sure about it. I also tried to update the python version for each package in requirements repo but it didn't help. Is there any way to upgrade the python version and pip as well in requirements? or does anybody know how to resolve this issue while building openstack-helm-images? I have attached logs for your ready reference. Help will be much appreciated thanks :) -- Thanks and Regards, Hemant Sonawane -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ERROR: Package 'magnum' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'senlin' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'ironic' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'openstack-heat' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'keystone' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'glance' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'neutron' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Package 'cinder' requires a different Python: 2.7.17 not in '>=3.6' ERROR: Could not find a version that satisfies the requirement scandir; python_version < "3.5" (from pathlib2===2.3.5->-c /tmp/wheels/upper-constraints.txt (line 509)) (from versions: none) ERROR: No matching distribution found for scandir; python_version < "3.5" (from pathlib2===2.3.5->-c /tmp/wheels/upper-constraints.txt (line 509)) ERROR: Package 'nova' requires a different Python: 2.7.17 not in '>=3.6' From pierre at stackhpc.com Wed Aug 19 08:22:08 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 19 Aug 2020 10:22:08 +0200 Subject: [cloudkitty][rdo] Broken cloudkitty RPMs on CentOS8 Message-ID: Hello, This issue was discovered on Kolla train-centos8 images, but I assume it applies to both Train and Ussuri for CentOS 8 in general, not just to Kolla. CloudKitty became timezone-aware in Train: https://review.opendev.org/#/c/669192/ This code references tz.UTC from the dateutil library. However, this was added only in dateutil 2.7.0, while train-centos8 Kolla images use package python3-dateutil-2.6.1-6.el8.noarch.rpm, causing the error captured at the end of this message [2]. I submitted a patch to raise the minimum requirement for dateutil in cloudkitty: https://review.opendev.org/#/c/742477/ However, how are those requirements taken into consideration when packaging OpenStack in RDO? RDO packages for CentOS7 provide python2-dateutil-2.8.0-1.el7.noarch.rpm, but there is no such package in the CentOS8 repository. Would it be better to just remove the use of tz.UTC? I believe we could use dateutil.tz.tzutc() instead. Thanks, Pierre Riteau (priteau) [1] http://mirror.centos.org/centos/8/cloud/x86_64/openstack-train/Packages/p/ [2] Error trace below: 2020-07-22 16:33:11.207 26 ERROR wsme.api [req-3c49884a-1412-42bb-a57e-e7c731360148 ef450a969a2945928d3ade785eaae860 19df0f36ede14c29be9ca476222f8ba9 default - -] Server-side error: "module 'dateutil.tz' has no attribute 'UTC'". Detail: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/wsmeext/pecan.py", line 85, in callfunction result = f(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/cloudkitty/api/v1/controllers/storage.py", line 71, in get_all paginate=False) File "/usr/lib/python3.6/site-packages/cloudkitty/storage/v2/influx.py", line 311, in retrieve begin, end = self._check_begin_end(begin, end) File "/usr/lib/python3.6/site-packages/cloudkitty/storage/v2/influx.py", line 271, in _check_begin_end end = tzutils.get_next_month() File "/usr/lib/python3.6/site-packages/cloudkitty/tzutils.py", line 150, in get_next_month return add_delta(start, datetime.timedelta(days=month_days)) File "/usr/lib/python3.6/site-packages/cloudkitty/tzutils.py", line 104, in add_delta return utc_to_local(local_to_utc(dt, naive=True) + delta) File "/usr/lib/python3.6/site-packages/cloudkitty/tzutils.py", line 52, in local_to_utc output = dt.astimezone(tz.UTC) AttributeError: module 'dateutil.tz' has no attribute 'UTC' From arnaud.morin at gmail.com Wed Aug 19 09:21:30 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Wed, 19 Aug 2020 09:21:30 +0000 Subject: [nova][ops] Live migration and CPU features In-Reply-To: <44347504ff7308a6c3b4155060c778fad368a002.camel@redhat.com> References: <44347504ff7308a6c3b4155060c778fad368a002.camel@redhat.com> Message-ID: <20200819092130.GX31915@sync> Hello, We have the same kind of issue. To help mitigate it, we do segregation and also use cpu_mode=custom, so we can use a model which is close to our hardware (cpu_model=Haswell-noTSX) and add extra_flags when needed. This is painful. Cheers, -- Arnaud Morin On 18.08.20 - 16:16, Sean Mooney wrote: > On Tue, 2020-08-18 at 17:06 +0200, Fabian Zimmermann wrote: > > Hi, > > > > We are using the "custom"-way. But this does not protect you from all issues. > > > > We had problems with a new cpu-generation not (jet) detected correctly > > in an libvirt-version. So libvirt failed back to the "desktop"-cpu of > > this newer generation, but didnt support/detect some features => > > blocked live-migration. > yes that is common when using really new hardware. having previouly worked > at intel and hitting this often that one of the reason i tend to default to host-passthouh > and recommend using AZ or aggreate to segreatate the cloud for live migration. > > in the case where your libvirt does not know about the new cpus your best approch is to use the > newest server cpu model that it know about and then if you really need the new fature you can try > to add theem using the config options but that is effectivly the same as using host-passhtough > which is why i default to that as a workaround instead. > > > > > Fabian > > > > Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira > > : > > > > > > Hi, > > > in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are > > > rebooted, bringing a new kernel, kvm, ... > > > > > > In order to have a good compromise between performance and flexibility (live migration) we have been using "host- > > > model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues > > > because we have the same hardware type per cell. > > > > > > The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that > > > were introduced because of the reboot (using centOS). > > > > > > If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes > > > (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live > > > migration destination. > > > > > > I wonder how other operators are solving this issue. > > > I don't like stopping OS upgrades. > > > What I'm considering is to define a "custom" cpu_mode for each hardware type. > > > > > > I would appreciate your comments and learn how you are solving this problem. > > > > > > Belmiro > > > > > > > > > From pierre at stackhpc.com Wed Aug 19 09:34:47 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Wed, 19 Aug 2020 11:34:47 +0200 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: <173c942a17b.dfe050d2111458.180813585646259079@ghanshyammann.com> Message-ID: Hello Christophe, Good to hear that Objectif Libre is still planning to be involved in the project. The existing core reviewer team is still in place. Do let us know if new contributors could be granted core reviewer privileges. Best wishes, Pierre Riteau (priteau) On Tue, 18 Aug 2020 at 23:21, Christophe Sauthier wrote: > > Hello everyone > > Sorry it took me a few days to answer that thread. > > First of all I am REALLY REALLY happy to see that a few persones are stepping up to continue to work on Cloudkitty. > > The situation is, like usually, a chaining of events (and honestly Thomas it is absolutely not related to the sale of Objectif Libre by Linkbynet). > In late 2019 we tried to push hard to organize a community around Cloudkitty. We have tried to organise a few call with some users explaining them the next challenges that the project will be facing and how we could all work on that. Like it is the case for many projects we had little/no feedback... > By early 2020 we had some turn over in the company (once again not related to the sale) and we have started to organise ourself to continue our ongoing on CLoudkitty like we are doing since the beginning of the project, that I have started some years ago... And then the COVID crisis arrived, and like many compagny in the world we had to change our priorities... > During the end of summer (before holidays..) we started to organize again internally to continue that work. So it is a great news that a community is rising, and we will be really happy to work with the rest of it to continue to improve Cloudkitty, especially since like Thomas said "It does the job" :) > > Christophe > > On Tue, Aug 11, 2020 at 6:16 AM Thierry Carrez wrote: >> >> Thomas Goirand wrote: >> > On 8/7/20 4:10 PM, Ghanshyam Mann wrote: >> >> Thanks, Pierre for helping with this. >> >> >> >> ttx has reached out to PTL (Justin Ferrieu (jferrieu) ) >> >> but I am not sure if he got any response back. >> >> No response so far, but they may all be in company summer vacation. >> >> > The end of the very good maintenance of Cloudkitty matched the date when >> > objectif libre was sold to Linkbynet. Maybe the new owner don't care enough? >> > >> > This is very disappointing as I've been using it for some time already, >> > and that I was satisfied by it (ie: it does the job...), and especially >> > that latest releases are able to scale correctly. >> > >> > I very much would love if Pierre Riteau was successful in taking over. >> > Good luck Pierre! I'll try to help whenever I can and if I'm not too busy. >> >> Given the volunteers (Pierre, Rafael, Luis) I would support the TC using >> its unholy powers to add extra core reviewers to cloudkitty. >> >> If the current PTL comes back, I'm sure they will appreciate the help, >> and can always fix/revert things before Victoria release. >> >> -- >> Thierry Carrez (ttx) >> > > > -- > > ---- > Christophe Sauthier > Directeur Général > > Objectif Libre : Au service de votre Cloud > > +33 (0) 6 16 98 63 96 | christophe.sauthier at objectif-libre.com > > https://www.objectif-libre.com | @objectiflibre > Recevez la Pause Cloud Et DevOps : https://olib.re/abo-pause From skaplons at redhat.com Wed Aug 19 10:40:29 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 19 Aug 2020 12:40:29 +0200 Subject: [neutron] CI meeting cancelled Message-ID: <20200819104029.u5qsqv36tbovritk@skaplons-mac> Hi, I have today some internal meeting in the same time as Neutron CI meeting is. Also, some of the team members who are usually attending this meeting are on pto this week so lets cancel it. If You see any CI related issue, please open LP and ping me on IRC. -- Slawek Kaplonski Principal software engineer Red Hat From ekultails at gmail.com Wed Aug 19 13:15:06 2020 From: ekultails at gmail.com (Luke Short) Date: Wed, 19 Aug 2020 09:15:06 -0400 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: Hey folks, All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. Sincerely, Luke Short On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: > > > On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin > wrote: > >> >> >> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: >> >>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin >>> wrote: >>> > >>> > >>> > >>> > On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya >>> wrote: >>> >> >>> >> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>> >> > >>> >> > >>> >> > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >> >> > > wrote: >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz < >>> aschultz at redhat.com >>> >> > > wrote: >>> >> > >>> >> > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>> >> > > wrote: >>> >> > > >>> >> > > >>> >> > > >>> >> > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>> >> > > wrote: >>> >> > >> >>> >> > >> FYI... >>> >> > >> >>> >> > >> If you find your jobs are failing with an error similar >>> to >>> >> > [1], you have been rate limited by docker.io < >>> http://docker.io> >>> >> > via the upstream mirror system and have hit [2]. I've been >>> >> > discussing the issue w/ upstream infra, rdo-infra and a few >>> CI >>> >> > engineers. >>> >> > >> >>> >> > >> There are a few ways to mitigate the issue however I >>> don't >>> >> > see any of the options being completed very quickly so I'm >>> >> > asking for your patience while this issue is socialized and >>> >> > resolved. >>> >> > >> >>> >> > >> For full transparency we're considering the following >>> options. >>> >> > >> >>> >> > >> 1. move off of docker.io to quay.io >>> >> > >>> >> > > >>> >> > > >>> >> > > quay.io also has API rate limit: >>> >> > > https://docs.quay.io/issues/429.html >>> >> > > >>> >> > > Now I'm not sure about how many requests per seconds one >>> can >>> >> > do vs the other but this would need to be checked with the >>> quay >>> >> > team before changing anything. >>> >> > > Also quay.io had its big downtimes as >>> well, >>> >> > SLA needs to be considered. >>> >> > > >>> >> > >> 2. local container builds for each job in master, >>> possibly >>> >> > ussuri >>> >> > > >>> >> > > >>> >> > > Not convinced. >>> >> > > You can look at CI logs: >>> >> > > - pulling / updating / pushing container images from >>> >> > docker.io to local registry takes ~10 >>> min on >>> >> > standalone (OVH) >>> >> > > - building containers from scratch with updated repos and >>> >> > pushing them to local registry takes ~29 min on standalone >>> (OVH). >>> >> > > >>> >> > >> >>> >> > >> 3. parent child jobs upstream where rpms and containers >>> will >>> >> > be build and host artifacts for the child jobs >>> >> > > >>> >> > > >>> >> > > Yes, we need to investigate that. >>> >> > > >>> >> > >> >>> >> > >> 4. remove some portion of the upstream jobs to lower the >>> >> > impact we have on 3rd party infrastructure. >>> >> > > >>> >> > > >>> >> > > I'm not sure I understand this one, maybe you can give an >>> >> > example of what could be removed? >>> >> > >>> >> > We need to re-evaulate our use of scenarios (e.g. we have >>> two >>> >> > scenario010's both are non-voting). There's a reason we >>> >> > historically >>> >> > didn't want to add more jobs because of these types of >>> resource >>> >> > constraints. I think we've added new jobs recently and >>> likely >>> >> > need to >>> >> > reduce what we run. Additionally we might want to look into >>> reducing >>> >> > what we run on stable branches as well. >>> >> > >>> >> > >>> >> > Oh... removing jobs (I thought we would remove some steps of >>> the jobs). >>> >> > Yes big +1, this should be a continuous goal when working on >>> CI, and >>> >> > always evaluating what we need vs what we run now. >>> >> > >>> >> > We should look at: >>> >> > 1) services deployed in scenarios that aren't worth testing >>> (e.g. >>> >> > deprecated or unused things) (and deprecate the unused things) >>> >> > 2) jobs themselves (I don't have any example beside scenario010 >>> but >>> >> > I'm sure there are more). >>> >> > -- >>> >> > Emilien Macchi >>> >> > >>> >> > >>> >> > Thanks Alex, Emilien >>> >> > >>> >> > +1 to reviewing the catalog and adjusting things on an ongoing >>> basis. >>> >> > >>> >> > All.. it looks like the issues with docker.io >>> were >>> >> > more of a flare up than a change in docker.io >>> policy >>> >> > or infrastructure [2]. The flare up started on July 27 8am utc and >>> >> > ended on July 27 17:00 utc, see screenshots. >>> >> >>> >> The numbers of image prepare workers and its exponential fallback >>> >> intervals should be also adjusted. I've analysed the log snippet [0] >>> for >>> >> the connection reset counts by workers versus the times the rate >>> >> limiting was triggered. See the details in the reported bug [1]. >>> >> >>> >> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >>> >> >>> >> Conn Reset Counts by a Worker PID: >>> >> 3 58412 >>> >> 2 58413 >>> >> 3 58415 >>> >> 3 58417 >>> >> >>> >> which seems too much of (workers*reconnects) and triggers rate >>> limiting >>> >> immediately. >>> >> >>> >> [0] >>> >> >>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>> >> >>> >> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>> >> >>> >> -- >>> >> Best regards, >>> >> Bogdan Dobrelya, >>> >> Irc #bogdando >>> >> >>> > >>> > FYI.. >>> > >>> > The issue w/ "too many requests" is back. Expect delays and failures >>> in attempting to merge your patches upstream across all branches. The >>> issue is being tracked as a critical issue. >>> >>> Working with the infra folks and we have identified the authorization >>> header as causing issues when we're rediected from docker.io to >>> cloudflare. I'll throw up a patch tomorrow to handle this case which >>> should improve our usage of the cache. It needs some testing against >>> other registries to ensure that we don't break authenticated fetching >>> of resources. >>> >>> Thanks Alex! >> > > > FYI.. we have been revisited by the container pull issue, "too many > requests". > Alex has some fresh patches on it: > https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 > > expect trouble in check and gate: > > http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjeanner at redhat.com Wed Aug 19 13:19:49 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 19 Aug 2020 15:19:49 +0200 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: <9c141d62-44db-a1c7-db44-894443d2576f@redhat.com> +1 - and a big thanks to Takashi for his hard work on puppet integration! On 8/18/20 4:28 PM, Emilien Macchi wrote: > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by > maintenance I mean not-funny-work). When our community was getting > smaller and smaller, he joined us and our review velicity went back to > eleven. He became a core maintainer very quickly and we're glad to have > him onboard. > > He's also been involved in taking care of puppet-tripleo for a few > months and I believe he has more than enough knowledge on the module to > provide core reviews and be part of the core maintainer group. I also > noticed his amount of contribution (bug fixes, improvements, reviews, > etc) in other TripleO repos and I'm confident he'll make his road to be > core in TripleO at some point. For now I would like him to propose him > to be core in puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to thank > Takashi for his work in TripleO and we're super happy to have new > contributors! > > Thanks, > -- > Emilien Macchi -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From aschultz at redhat.com Wed Aug 19 13:23:18 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 19 Aug 2020 07:23:18 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: > > Hey folks, > > All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. > I'm looking at this today but from what I can tell the problem is likely caused by a reduced anonymous query quota from docker.io and our usage of the upstream mirrors. Because the mirrors essentially funnel all requests through a single IP we're hitting limits faster than if we didn't use the mirrors. Due to the nature of the requests, the metadata queries don't get cached due to the authorization header but are subject to the rate limiting. Additionally we're querying the registry to determine which containers we need to update in CI because we limit our updates to a certain set of containers as part of the CI jobs. So there are likely a few different steps forward on this and we can do a few of these together. 1) stop using mirrors (not ideal but likely makes this go away). Alternatively switch stable branches off the mirrors due to a reduced number of executions and leave mirrors configured on master only (or vice versa). 2) reduce the number of jobs 3) stop querying the registry for the update filters (i'm looking into this today) and use the information in tripleo-common first. 4) build containers always instead of fetching from docker.io Thanks, -Alex > Sincerely, > Luke Short > > On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: >> >> >> >> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: >>> >>> >>> >>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: >>>> >>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: >>>> > >>>> > >>>> > >>>> > On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: >>>> >> >>>> >> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>>> >> > >>>> >> > >>>> >> > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>> >> > > wrote: >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>> >> > > wrote: >>>> >> > >>>> >> > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>>> >> > > wrote: >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>>> >> > > wrote: >>>> >> > >> >>>> >> > >> FYI... >>>> >> > >> >>>> >> > >> If you find your jobs are failing with an error similar to >>>> >> > [1], you have been rate limited by docker.io >>>> >> > via the upstream mirror system and have hit [2]. I've been >>>> >> > discussing the issue w/ upstream infra, rdo-infra and a few CI >>>> >> > engineers. >>>> >> > >> >>>> >> > >> There are a few ways to mitigate the issue however I don't >>>> >> > see any of the options being completed very quickly so I'm >>>> >> > asking for your patience while this issue is socialized and >>>> >> > resolved. >>>> >> > >> >>>> >> > >> For full transparency we're considering the following options. >>>> >> > >> >>>> >> > >> 1. move off of docker.io to quay.io >>>> >> > >>>> >> > > >>>> >> > > >>>> >> > > quay.io also has API rate limit: >>>> >> > > https://docs.quay.io/issues/429.html >>>> >> > > >>>> >> > > Now I'm not sure about how many requests per seconds one can >>>> >> > do vs the other but this would need to be checked with the quay >>>> >> > team before changing anything. >>>> >> > > Also quay.io had its big downtimes as well, >>>> >> > SLA needs to be considered. >>>> >> > > >>>> >> > >> 2. local container builds for each job in master, possibly >>>> >> > ussuri >>>> >> > > >>>> >> > > >>>> >> > > Not convinced. >>>> >> > > You can look at CI logs: >>>> >> > > - pulling / updating / pushing container images from >>>> >> > docker.io to local registry takes ~10 min on >>>> >> > standalone (OVH) >>>> >> > > - building containers from scratch with updated repos and >>>> >> > pushing them to local registry takes ~29 min on standalone (OVH). >>>> >> > > >>>> >> > >> >>>> >> > >> 3. parent child jobs upstream where rpms and containers will >>>> >> > be build and host artifacts for the child jobs >>>> >> > > >>>> >> > > >>>> >> > > Yes, we need to investigate that. >>>> >> > > >>>> >> > >> >>>> >> > >> 4. remove some portion of the upstream jobs to lower the >>>> >> > impact we have on 3rd party infrastructure. >>>> >> > > >>>> >> > > >>>> >> > > I'm not sure I understand this one, maybe you can give an >>>> >> > example of what could be removed? >>>> >> > >>>> >> > We need to re-evaulate our use of scenarios (e.g. we have two >>>> >> > scenario010's both are non-voting). There's a reason we >>>> >> > historically >>>> >> > didn't want to add more jobs because of these types of resource >>>> >> > constraints. I think we've added new jobs recently and likely >>>> >> > need to >>>> >> > reduce what we run. Additionally we might want to look into reducing >>>> >> > what we run on stable branches as well. >>>> >> > >>>> >> > >>>> >> > Oh... removing jobs (I thought we would remove some steps of the jobs). >>>> >> > Yes big +1, this should be a continuous goal when working on CI, and >>>> >> > always evaluating what we need vs what we run now. >>>> >> > >>>> >> > We should look at: >>>> >> > 1) services deployed in scenarios that aren't worth testing (e.g. >>>> >> > deprecated or unused things) (and deprecate the unused things) >>>> >> > 2) jobs themselves (I don't have any example beside scenario010 but >>>> >> > I'm sure there are more). >>>> >> > -- >>>> >> > Emilien Macchi >>>> >> > >>>> >> > >>>> >> > Thanks Alex, Emilien >>>> >> > >>>> >> > +1 to reviewing the catalog and adjusting things on an ongoing basis. >>>> >> > >>>> >> > All.. it looks like the issues with docker.io were >>>> >> > more of a flare up than a change in docker.io policy >>>> >> > or infrastructure [2]. The flare up started on July 27 8am utc and >>>> >> > ended on July 27 17:00 utc, see screenshots. >>>> >> >>>> >> The numbers of image prepare workers and its exponential fallback >>>> >> intervals should be also adjusted. I've analysed the log snippet [0] for >>>> >> the connection reset counts by workers versus the times the rate >>>> >> limiting was triggered. See the details in the reported bug [1]. >>>> >> >>>> >> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >>>> >> >>>> >> Conn Reset Counts by a Worker PID: >>>> >> 3 58412 >>>> >> 2 58413 >>>> >> 3 58415 >>>> >> 3 58417 >>>> >> >>>> >> which seems too much of (workers*reconnects) and triggers rate limiting >>>> >> immediately. >>>> >> >>>> >> [0] >>>> >> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>>> >> >>>> >> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>>> >> >>>> >> -- >>>> >> Best regards, >>>> >> Bogdan Dobrelya, >>>> >> Irc #bogdando >>>> >> >>>> > >>>> > FYI.. >>>> > >>>> > The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. >>>> >>>> Working with the infra folks and we have identified the authorization >>>> header as causing issues when we're rediected from docker.io to >>>> cloudflare. I'll throw up a patch tomorrow to handle this case which >>>> should improve our usage of the cache. It needs some testing against >>>> other registries to ensure that we don't break authenticated fetching >>>> of resources. >>>> >>> Thanks Alex! >> >> >> >> FYI.. we have been revisited by the container pull issue, "too many requests". >> Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 >> >> expect trouble in check and gate: >> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 >> From eblock at nde.ag Wed Aug 19 13:36:16 2020 From: eblock at nde.ag (Eugen Block) Date: Wed, 19 Aug 2020 13:36:16 +0000 Subject: [neutron] Disable dhcp drop rule Message-ID: <20200819133616.Horde.zhXC_mhe4RdzjbP4Shl1M45@webmail.nde.ag> Hi *, we recently upgraded our Ocata Cloud to Train and also switched from linuxbridge to openvswitch. One of our instances within the cloud works as DHCP server and to make that work we had to comment the respective part in this file on the compute node the instance was running on: /usr/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py Now we tried the same in /usr/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py /usr/lib/python3.6/site-packages/neutron/agent/linux/iptables_firewall.py but restarting openstack-neutron-openvswitch-agent.service didn't drop that rule, the DHCP reply didn't get through. To continue with our work we just dropped it manually, so we get by, but since there have been a couple of years between Ocata and Train, is there any smoother or better way to achieve this? This seems to be a reoccuring request but I couldn't find any updates on this topic. Maybe someone here can shed some light? Is there more to change than those two files I mentioned? Any pointers are highly appreciated! Best regards, Eugen From ramishra at redhat.com Wed Aug 19 13:37:29 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Wed, 19 Aug 2020 19:07:29 +0530 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: Message-ID: +1 On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi wrote: > Hi people, > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by maintenance > I mean not-funny-work). When our community was getting smaller and smaller, > he joined us and our review velicity went back to eleven. He became a core > maintainer very quickly and we're glad to have him onboard. > > He's also been involved in taking care of puppet-tripleo for a few months > and I believe he has more than enough knowledge on the module to provide > core reviews and be part of the core maintainer group. I also noticed his > amount of contribution (bug fixes, improvements, reviews, etc) in other > TripleO repos and I'm confident he'll make his road to be core in TripleO > at some point. For now I would like him to propose him to be core in > puppet-tripleo. > > As usual, any feedback is welcome but in the meantime I want to thank > Takashi for his work in TripleO and we're super happy to have new > contributors! > > Thanks, > -- > Emilien Macchi > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjeanner at redhat.com Wed Aug 19 13:40:08 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 19 Aug 2020 15:40:08 +0200 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On 8/19/20 3:23 PM, Alex Schultz wrote: > On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: >> >> Hey folks, >> >> All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. >> > > I'm looking at this today but from what I can tell the problem is > likely caused by a reduced anonymous query quota from docker.io and > our usage of the upstream mirrors. Because the mirrors essentially > funnel all requests through a single IP we're hitting limits faster > than if we didn't use the mirrors. Due to the nature of the requests, > the metadata queries don't get cached due to the authorization header > but are subject to the rate limiting. Additionally we're querying the > registry to determine which containers we need to update in CI because > we limit our updates to a certain set of containers as part of the CI > jobs. > > So there are likely a few different steps forward on this and we can > do a few of these together. > > 1) stop using mirrors (not ideal but likely makes this go away). > Alternatively switch stable branches off the mirrors due to a reduced > number of executions and leave mirrors configured on master only (or > vice versa). might be good, but it might lead to some other issues - docker might want to rate-limit on container owner. I wouldn't be surprised if they go that way in the future. Could be OK as a first "unlocking step". But we should consider 2) and 3). > 2) reduce the number of jobs always a good thing to do, +1 > 3) stop querying the registry for the update filters (i'm looking into > this today) and use the information in tripleo-common first. +1 - thanks for looking into it! > 4) build containers always instead of fetching from docker.io meh... last resort, if really nothing else works... It's time consuming and will lead to other issues within the CI (job timeout and the like), wouldn't it? > > Thanks, > -Alex > > > >> Sincerely, >> Luke Short >> >> On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: >>> >>> >>> >>> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: >>>> >>>> >>>> >>>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: >>>>> >>>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: >>>>>>> >>>>>>> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>>>>>> > wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>>>>>> > wrote: >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>>>>>>> > wrote: >>>>>>>> >> >>>>>>>> >> FYI... >>>>>>>> >> >>>>>>>> >> If you find your jobs are failing with an error similar to >>>>>>>> [1], you have been rate limited by docker.io >>>>>>>> via the upstream mirror system and have hit [2]. I've been >>>>>>>> discussing the issue w/ upstream infra, rdo-infra and a few CI >>>>>>>> engineers. >>>>>>>> >> >>>>>>>> >> There are a few ways to mitigate the issue however I don't >>>>>>>> see any of the options being completed very quickly so I'm >>>>>>>> asking for your patience while this issue is socialized and >>>>>>>> resolved. >>>>>>>> >> >>>>>>>> >> For full transparency we're considering the following options. >>>>>>>> >> >>>>>>>> >> 1. move off of docker.io to quay.io >>>>>>>> >>>>>>>> > >>>>>>>> > >>>>>>>> > quay.io also has API rate limit: >>>>>>>> > https://docs.quay.io/issues/429.html >>>>>>>> > >>>>>>>> > Now I'm not sure about how many requests per seconds one can >>>>>>>> do vs the other but this would need to be checked with the quay >>>>>>>> team before changing anything. >>>>>>>> > Also quay.io had its big downtimes as well, >>>>>>>> SLA needs to be considered. >>>>>>>> > >>>>>>>> >> 2. local container builds for each job in master, possibly >>>>>>>> ussuri >>>>>>>> > >>>>>>>> > >>>>>>>> > Not convinced. >>>>>>>> > You can look at CI logs: >>>>>>>> > - pulling / updating / pushing container images from >>>>>>>> docker.io to local registry takes ~10 min on >>>>>>>> standalone (OVH) >>>>>>>> > - building containers from scratch with updated repos and >>>>>>>> pushing them to local registry takes ~29 min on standalone (OVH). >>>>>>>> > >>>>>>>> >> >>>>>>>> >> 3. parent child jobs upstream where rpms and containers will >>>>>>>> be build and host artifacts for the child jobs >>>>>>>> > >>>>>>>> > >>>>>>>> > Yes, we need to investigate that. >>>>>>>> > >>>>>>>> >> >>>>>>>> >> 4. remove some portion of the upstream jobs to lower the >>>>>>>> impact we have on 3rd party infrastructure. >>>>>>>> > >>>>>>>> > >>>>>>>> > I'm not sure I understand this one, maybe you can give an >>>>>>>> example of what could be removed? >>>>>>>> >>>>>>>> We need to re-evaulate our use of scenarios (e.g. we have two >>>>>>>> scenario010's both are non-voting). There's a reason we >>>>>>>> historically >>>>>>>> didn't want to add more jobs because of these types of resource >>>>>>>> constraints. I think we've added new jobs recently and likely >>>>>>>> need to >>>>>>>> reduce what we run. Additionally we might want to look into reducing >>>>>>>> what we run on stable branches as well. >>>>>>>> >>>>>>>> >>>>>>>> Oh... removing jobs (I thought we would remove some steps of the jobs). >>>>>>>> Yes big +1, this should be a continuous goal when working on CI, and >>>>>>>> always evaluating what we need vs what we run now. >>>>>>>> >>>>>>>> We should look at: >>>>>>>> 1) services deployed in scenarios that aren't worth testing (e.g. >>>>>>>> deprecated or unused things) (and deprecate the unused things) >>>>>>>> 2) jobs themselves (I don't have any example beside scenario010 but >>>>>>>> I'm sure there are more). >>>>>>>> -- >>>>>>>> Emilien Macchi >>>>>>>> >>>>>>>> >>>>>>>> Thanks Alex, Emilien >>>>>>>> >>>>>>>> +1 to reviewing the catalog and adjusting things on an ongoing basis. >>>>>>>> >>>>>>>> All.. it looks like the issues with docker.io were >>>>>>>> more of a flare up than a change in docker.io policy >>>>>>>> or infrastructure [2]. The flare up started on July 27 8am utc and >>>>>>>> ended on July 27 17:00 utc, see screenshots. >>>>>>> >>>>>>> The numbers of image prepare workers and its exponential fallback >>>>>>> intervals should be also adjusted. I've analysed the log snippet [0] for >>>>>>> the connection reset counts by workers versus the times the rate >>>>>>> limiting was triggered. See the details in the reported bug [1]. >>>>>>> >>>>>>> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >>>>>>> >>>>>>> Conn Reset Counts by a Worker PID: >>>>>>> 3 58412 >>>>>>> 2 58413 >>>>>>> 3 58415 >>>>>>> 3 58417 >>>>>>> >>>>>>> which seems too much of (workers*reconnects) and triggers rate limiting >>>>>>> immediately. >>>>>>> >>>>>>> [0] >>>>>>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>>>>>> >>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Bogdan Dobrelya, >>>>>>> Irc #bogdando >>>>>>> >>>>>> >>>>>> FYI.. >>>>>> >>>>>> The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. >>>>> >>>>> Working with the infra folks and we have identified the authorization >>>>> header as causing issues when we're rediected from docker.io to >>>>> cloudflare. I'll throw up a patch tomorrow to handle this case which >>>>> should improve our usage of the cache. It needs some testing against >>>>> other registries to ensure that we don't break authenticated fetching >>>>> of resources. >>>>> >>>> Thanks Alex! >>> >>> >>> >>> FYI.. we have been revisited by the container pull issue, "too many requests". >>> Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 >>> >>> expect trouble in check and gate: >>> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 >>> > > -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From bdobreli at redhat.com Wed Aug 19 13:53:00 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 19 Aug 2020 15:53:00 +0200 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: <75d5fd38-f0bb-01eb-54a4-bfc3f0c42474@redhat.com> On 8/19/20 3:23 PM, Alex Schultz wrote: > On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: >> >> Hey folks, >> >> All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. >> > > I'm looking at this today but from what I can tell the problem is > likely caused by a reduced anonymous query quota from docker.io and > our usage of the upstream mirrors. Because the mirrors essentially > funnel all requests through a single IP we're hitting limits faster > than if we didn't use the mirrors. Due to the nature of the requests, > the metadata queries don't get cached due to the authorization header > but are subject to the rate limiting. Additionally we're querying the > registry to determine which containers we need to update in CI because > we limit our updates to a certain set of containers as part of the CI > jobs. > > So there are likely a few different steps forward on this and we can > do a few of these together. > > 1) stop using mirrors (not ideal but likely makes this go away). > Alternatively switch stable branches off the mirrors due to a reduced > number of executions and leave mirrors configured on master only (or > vice versa). Also, the stable/(N-1) branch could use quay.io, while master keeps using docker.io (assuming containers for that N-1 release will be hosted there instead of the dockerhub) > 2) reduce the number of jobs > 3) stop querying the registry for the update filters (i'm looking into > this today) and use the information in tripleo-common first. > 4) build containers always instead of fetching from docker.io > > Thanks, > -Alex > > > >> Sincerely, >> Luke Short >> >> On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: >>> >>> >>> >>> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: >>>> >>>> >>>> >>>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: >>>>> >>>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: >>>>>>> >>>>>>> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>>>>>> > wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>>>>>> > wrote: >>>>>>>> >>>>>>>> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>>>>>>> > wrote: >>>>>>>> >> >>>>>>>> >> FYI... >>>>>>>> >> >>>>>>>> >> If you find your jobs are failing with an error similar to >>>>>>>> [1], you have been rate limited by docker.io >>>>>>>> via the upstream mirror system and have hit [2]. I've been >>>>>>>> discussing the issue w/ upstream infra, rdo-infra and a few CI >>>>>>>> engineers. >>>>>>>> >> >>>>>>>> >> There are a few ways to mitigate the issue however I don't >>>>>>>> see any of the options being completed very quickly so I'm >>>>>>>> asking for your patience while this issue is socialized and >>>>>>>> resolved. >>>>>>>> >> >>>>>>>> >> For full transparency we're considering the following options. >>>>>>>> >> >>>>>>>> >> 1. move off of docker.io to quay.io >>>>>>>> >>>>>>>> > >>>>>>>> > >>>>>>>> > quay.io also has API rate limit: >>>>>>>> > https://docs.quay.io/issues/429.html >>>>>>>> > >>>>>>>> > Now I'm not sure about how many requests per seconds one can >>>>>>>> do vs the other but this would need to be checked with the quay >>>>>>>> team before changing anything. >>>>>>>> > Also quay.io had its big downtimes as well, >>>>>>>> SLA needs to be considered. >>>>>>>> > >>>>>>>> >> 2. local container builds for each job in master, possibly >>>>>>>> ussuri >>>>>>>> > >>>>>>>> > >>>>>>>> > Not convinced. >>>>>>>> > You can look at CI logs: >>>>>>>> > - pulling / updating / pushing container images from >>>>>>>> docker.io to local registry takes ~10 min on >>>>>>>> standalone (OVH) >>>>>>>> > - building containers from scratch with updated repos and >>>>>>>> pushing them to local registry takes ~29 min on standalone (OVH). >>>>>>>> > >>>>>>>> >> >>>>>>>> >> 3. parent child jobs upstream where rpms and containers will >>>>>>>> be build and host artifacts for the child jobs >>>>>>>> > >>>>>>>> > >>>>>>>> > Yes, we need to investigate that. >>>>>>>> > >>>>>>>> >> >>>>>>>> >> 4. remove some portion of the upstream jobs to lower the >>>>>>>> impact we have on 3rd party infrastructure. >>>>>>>> > >>>>>>>> > >>>>>>>> > I'm not sure I understand this one, maybe you can give an >>>>>>>> example of what could be removed? >>>>>>>> >>>>>>>> We need to re-evaulate our use of scenarios (e.g. we have two >>>>>>>> scenario010's both are non-voting). There's a reason we >>>>>>>> historically >>>>>>>> didn't want to add more jobs because of these types of resource >>>>>>>> constraints. I think we've added new jobs recently and likely >>>>>>>> need to >>>>>>>> reduce what we run. Additionally we might want to look into reducing >>>>>>>> what we run on stable branches as well. >>>>>>>> >>>>>>>> >>>>>>>> Oh... removing jobs (I thought we would remove some steps of the jobs). >>>>>>>> Yes big +1, this should be a continuous goal when working on CI, and >>>>>>>> always evaluating what we need vs what we run now. >>>>>>>> >>>>>>>> We should look at: >>>>>>>> 1) services deployed in scenarios that aren't worth testing (e.g. >>>>>>>> deprecated or unused things) (and deprecate the unused things) >>>>>>>> 2) jobs themselves (I don't have any example beside scenario010 but >>>>>>>> I'm sure there are more). >>>>>>>> -- >>>>>>>> Emilien Macchi >>>>>>>> >>>>>>>> >>>>>>>> Thanks Alex, Emilien >>>>>>>> >>>>>>>> +1 to reviewing the catalog and adjusting things on an ongoing basis. >>>>>>>> >>>>>>>> All.. it looks like the issues with docker.io were >>>>>>>> more of a flare up than a change in docker.io policy >>>>>>>> or infrastructure [2]. The flare up started on July 27 8am utc and >>>>>>>> ended on July 27 17:00 utc, see screenshots. >>>>>>> >>>>>>> The numbers of image prepare workers and its exponential fallback >>>>>>> intervals should be also adjusted. I've analysed the log snippet [0] for >>>>>>> the connection reset counts by workers versus the times the rate >>>>>>> limiting was triggered. See the details in the reported bug [1]. >>>>>>> >>>>>>> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >>>>>>> >>>>>>> Conn Reset Counts by a Worker PID: >>>>>>> 3 58412 >>>>>>> 2 58413 >>>>>>> 3 58415 >>>>>>> 3 58417 >>>>>>> >>>>>>> which seems too much of (workers*reconnects) and triggers rate limiting >>>>>>> immediately. >>>>>>> >>>>>>> [0] >>>>>>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>>>>>> >>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Bogdan Dobrelya, >>>>>>> Irc #bogdando >>>>>>> >>>>>> >>>>>> FYI.. >>>>>> >>>>>> The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. >>>>> >>>>> Working with the infra folks and we have identified the authorization >>>>> header as causing issues when we're rediected from docker.io to >>>>> cloudflare. I'll throw up a patch tomorrow to handle this case which >>>>> should improve our usage of the cache. It needs some testing against >>>>> other registries to ensure that we don't break authenticated fetching >>>>> of resources. >>>>> >>>> Thanks Alex! >>> >>> >>> >>> FYI.. we have been revisited by the container pull issue, "too many requests". >>> Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 >>> >>> expect trouble in check and gate: >>> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 >>> > -- Best regards, Bogdan Dobrelya, Irc #bogdando From aschultz at redhat.com Wed Aug 19 13:55:09 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 19 Aug 2020 07:55:09 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: <75d5fd38-f0bb-01eb-54a4-bfc3f0c42474@redhat.com> References: <75d5fd38-f0bb-01eb-54a4-bfc3f0c42474@redhat.com> Message-ID: On Wed, Aug 19, 2020 at 7:53 AM Bogdan Dobrelya wrote: > > On 8/19/20 3:23 PM, Alex Schultz wrote: > > On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: > >> > >> Hey folks, > >> > >> All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. > >> > > > > I'm looking at this today but from what I can tell the problem is > > likely caused by a reduced anonymous query quota from docker.io and > > our usage of the upstream mirrors. Because the mirrors essentially > > funnel all requests through a single IP we're hitting limits faster > > than if we didn't use the mirrors. Due to the nature of the requests, > > the metadata queries don't get cached due to the authorization header > > but are subject to the rate limiting. Additionally we're querying the > > registry to determine which containers we need to update in CI because > > we limit our updates to a certain set of containers as part of the CI > > jobs. > > > > So there are likely a few different steps forward on this and we can > > do a few of these together. > > > > 1) stop using mirrors (not ideal but likely makes this go away). > > Alternatively switch stable branches off the mirrors due to a reduced > > number of executions and leave mirrors configured on master only (or > > vice versa). > > Also, the stable/(N-1) branch could use quay.io, while master keeps > using docker.io (assuming containers for that N-1 release will be hosted > there instead of the dockerhub) > quay has its own limits and likely will suffer from a similar problem. > > 2) reduce the number of jobs > > 3) stop querying the registry for the update filters (i'm looking into > > this today) and use the information in tripleo-common first. > > 4) build containers always instead of fetching from docker.io > > > > Thanks, > > -Alex > > > > > > > >> Sincerely, > >> Luke Short > >> > >> On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: > >>> > >>> > >>> > >>> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: > >>>> > >>>> > >>>> > >>>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: > >>>>> > >>>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: > >>>>>>> > >>>>>>> On 7/28/20 6:09 PM, Wesley Hayutin wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>>>>>>> > wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>>>>>>> > wrote: > >>>>>>>> > >>>>>>>> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi > >>>>>>>> > wrote: > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin > >>>>>>>> > wrote: > >>>>>>>> >> > >>>>>>>> >> FYI... > >>>>>>>> >> > >>>>>>>> >> If you find your jobs are failing with an error similar to > >>>>>>>> [1], you have been rate limited by docker.io > >>>>>>>> via the upstream mirror system and have hit [2]. I've been > >>>>>>>> discussing the issue w/ upstream infra, rdo-infra and a few CI > >>>>>>>> engineers. > >>>>>>>> >> > >>>>>>>> >> There are a few ways to mitigate the issue however I don't > >>>>>>>> see any of the options being completed very quickly so I'm > >>>>>>>> asking for your patience while this issue is socialized and > >>>>>>>> resolved. > >>>>>>>> >> > >>>>>>>> >> For full transparency we're considering the following options. > >>>>>>>> >> > >>>>>>>> >> 1. move off of docker.io to quay.io > >>>>>>>> > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > quay.io also has API rate limit: > >>>>>>>> > https://docs.quay.io/issues/429.html > >>>>>>>> > > >>>>>>>> > Now I'm not sure about how many requests per seconds one can > >>>>>>>> do vs the other but this would need to be checked with the quay > >>>>>>>> team before changing anything. > >>>>>>>> > Also quay.io had its big downtimes as well, > >>>>>>>> SLA needs to be considered. > >>>>>>>> > > >>>>>>>> >> 2. local container builds for each job in master, possibly > >>>>>>>> ussuri > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > Not convinced. > >>>>>>>> > You can look at CI logs: > >>>>>>>> > - pulling / updating / pushing container images from > >>>>>>>> docker.io to local registry takes ~10 min on > >>>>>>>> standalone (OVH) > >>>>>>>> > - building containers from scratch with updated repos and > >>>>>>>> pushing them to local registry takes ~29 min on standalone (OVH). > >>>>>>>> > > >>>>>>>> >> > >>>>>>>> >> 3. parent child jobs upstream where rpms and containers will > >>>>>>>> be build and host artifacts for the child jobs > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > Yes, we need to investigate that. > >>>>>>>> > > >>>>>>>> >> > >>>>>>>> >> 4. remove some portion of the upstream jobs to lower the > >>>>>>>> impact we have on 3rd party infrastructure. > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > I'm not sure I understand this one, maybe you can give an > >>>>>>>> example of what could be removed? > >>>>>>>> > >>>>>>>> We need to re-evaulate our use of scenarios (e.g. we have two > >>>>>>>> scenario010's both are non-voting). There's a reason we > >>>>>>>> historically > >>>>>>>> didn't want to add more jobs because of these types of resource > >>>>>>>> constraints. I think we've added new jobs recently and likely > >>>>>>>> need to > >>>>>>>> reduce what we run. Additionally we might want to look into reducing > >>>>>>>> what we run on stable branches as well. > >>>>>>>> > >>>>>>>> > >>>>>>>> Oh... removing jobs (I thought we would remove some steps of the jobs). > >>>>>>>> Yes big +1, this should be a continuous goal when working on CI, and > >>>>>>>> always evaluating what we need vs what we run now. > >>>>>>>> > >>>>>>>> We should look at: > >>>>>>>> 1) services deployed in scenarios that aren't worth testing (e.g. > >>>>>>>> deprecated or unused things) (and deprecate the unused things) > >>>>>>>> 2) jobs themselves (I don't have any example beside scenario010 but > >>>>>>>> I'm sure there are more). > >>>>>>>> -- > >>>>>>>> Emilien Macchi > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks Alex, Emilien > >>>>>>>> > >>>>>>>> +1 to reviewing the catalog and adjusting things on an ongoing basis. > >>>>>>>> > >>>>>>>> All.. it looks like the issues with docker.io were > >>>>>>>> more of a flare up than a change in docker.io policy > >>>>>>>> or infrastructure [2]. The flare up started on July 27 8am utc and > >>>>>>>> ended on July 27 17:00 utc, see screenshots. > >>>>>>> > >>>>>>> The numbers of image prepare workers and its exponential fallback > >>>>>>> intervals should be also adjusted. I've analysed the log snippet [0] for > >>>>>>> the connection reset counts by workers versus the times the rate > >>>>>>> limiting was triggered. See the details in the reported bug [1]. > >>>>>>> > >>>>>>> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: > >>>>>>> > >>>>>>> Conn Reset Counts by a Worker PID: > >>>>>>> 3 58412 > >>>>>>> 2 58413 > >>>>>>> 3 58415 > >>>>>>> 3 58417 > >>>>>>> > >>>>>>> which seems too much of (workers*reconnects) and triggers rate limiting > >>>>>>> immediately. > >>>>>>> > >>>>>>> [0] > >>>>>>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log > >>>>>>> > >>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 > >>>>>>> > >>>>>>> -- > >>>>>>> Best regards, > >>>>>>> Bogdan Dobrelya, > >>>>>>> Irc #bogdando > >>>>>>> > >>>>>> > >>>>>> FYI.. > >>>>>> > >>>>>> The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. > >>>>> > >>>>> Working with the infra folks and we have identified the authorization > >>>>> header as causing issues when we're rediected from docker.io to > >>>>> cloudflare. I'll throw up a patch tomorrow to handle this case which > >>>>> should improve our usage of the cache. It needs some testing against > >>>>> other registries to ensure that we don't break authenticated fetching > >>>>> of resources. > >>>>> > >>>> Thanks Alex! > >>> > >>> > >>> > >>> FYI.. we have been revisited by the container pull issue, "too many requests". > >>> Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 > >>> > >>> expect trouble in check and gate: > >>> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 > >>> > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > From bdobreli at redhat.com Wed Aug 19 14:31:10 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 19 Aug 2020 16:31:10 +0200 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: <75d5fd38-f0bb-01eb-54a4-bfc3f0c42474@redhat.com> Message-ID: <6ac5db1c-4750-de2d-627f-edc68e3da516@redhat.com> On 8/19/20 3:55 PM, Alex Schultz wrote: > On Wed, Aug 19, 2020 at 7:53 AM Bogdan Dobrelya wrote: >> >> On 8/19/20 3:23 PM, Alex Schultz wrote: >>> On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: >>>> >>>> Hey folks, >>>> >>>> All of the latest patches to address this have been merged in but we are still seeing this error randomly in CI jobs that involve an Undercloud or Standalone node. As far as I can tell, the error is appearing less often than before but it is still present making merging new patches difficult. I would be happy to help work towards other possible solutions however I am unsure where to start from here. Any help would be greatly appreciated. >>>> >>> >>> I'm looking at this today but from what I can tell the problem is >>> likely caused by a reduced anonymous query quota from docker.io and >>> our usage of the upstream mirrors. Because the mirrors essentially >>> funnel all requests through a single IP we're hitting limits faster >>> than if we didn't use the mirrors. Due to the nature of the requests, >>> the metadata queries don't get cached due to the authorization header >>> but are subject to the rate limiting. Additionally we're querying the >>> registry to determine which containers we need to update in CI because >>> we limit our updates to a certain set of containers as part of the CI >>> jobs. >>> >>> So there are likely a few different steps forward on this and we can >>> do a few of these together. >>> >>> 1) stop using mirrors (not ideal but likely makes this go away). >>> Alternatively switch stable branches off the mirrors due to a reduced >>> number of executions and leave mirrors configured on master only (or >>> vice versa). >> >> Also, the stable/(N-1) branch could use quay.io, while master keeps >> using docker.io (assuming containers for that N-1 release will be hosted >> there instead of the dockerhub) >> > > quay has its own limits and likely will suffer from a similar problem. Right. But dropped numbers of total requests sent to each registry could end up with less often rate limiting by either of two. > >>> 2) reduce the number of jobs >>> 3) stop querying the registry for the update filters (i'm looking into >>> this today) and use the information in tripleo-common first. >>> 4) build containers always instead of fetching from docker.io There may be a middle-ground solution. Building it only once for each patchset executed in TripleO Zuul pipelines. Transient images, like [0], that can have TTL and self-expire should be used for that purpose. [0] https://idbs-engineering.com/containers/2019/08/27/auto-expiry-quayio-tags.html That would require the zuul jobs with dependencies passing ansible variables to each other, by the execution results. Can that be done? Pretty much like we have it already set in TripleO for tox jobs as a dependency for standalone/multinode jobs. But adding an extra step to prepare such a transient pack of the container images (only to be used for that patchset) and push it to a quay registry hosted elsewhere by TripleO devops folks. Then the jobs that have that dependency met can use those transient images via an ansible variable passed for the jobs. Auto expiration solves the space/lifecycle requirements for the cloud that will be hosting that registry. >>> >>> Thanks, >>> -Alex >>> >>> >>> >>>> Sincerely, >>>> Luke Short >>>> >>>> On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: >>>>>>> >>>>>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: >>>>>>>>> >>>>>>>>> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>>>>>>>>> > wrote: >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>>>>>>>>> > wrote: >>>>>>>>>> >> >>>>>>>>>> >> FYI... >>>>>>>>>> >> >>>>>>>>>> >> If you find your jobs are failing with an error similar to >>>>>>>>>> [1], you have been rate limited by docker.io >>>>>>>>>> via the upstream mirror system and have hit [2]. I've been >>>>>>>>>> discussing the issue w/ upstream infra, rdo-infra and a few CI >>>>>>>>>> engineers. >>>>>>>>>> >> >>>>>>>>>> >> There are a few ways to mitigate the issue however I don't >>>>>>>>>> see any of the options being completed very quickly so I'm >>>>>>>>>> asking for your patience while this issue is socialized and >>>>>>>>>> resolved. >>>>>>>>>> >> >>>>>>>>>> >> For full transparency we're considering the following options. >>>>>>>>>> >> >>>>>>>>>> >> 1. move off of docker.io to quay.io >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > quay.io also has API rate limit: >>>>>>>>>> > https://docs.quay.io/issues/429.html >>>>>>>>>> > >>>>>>>>>> > Now I'm not sure about how many requests per seconds one can >>>>>>>>>> do vs the other but this would need to be checked with the quay >>>>>>>>>> team before changing anything. >>>>>>>>>> > Also quay.io had its big downtimes as well, >>>>>>>>>> SLA needs to be considered. >>>>>>>>>> > >>>>>>>>>> >> 2. local container builds for each job in master, possibly >>>>>>>>>> ussuri >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > Not convinced. >>>>>>>>>> > You can look at CI logs: >>>>>>>>>> > - pulling / updating / pushing container images from >>>>>>>>>> docker.io to local registry takes ~10 min on >>>>>>>>>> standalone (OVH) >>>>>>>>>> > - building containers from scratch with updated repos and >>>>>>>>>> pushing them to local registry takes ~29 min on standalone (OVH). >>>>>>>>>> > >>>>>>>>>> >> >>>>>>>>>> >> 3. parent child jobs upstream where rpms and containers will >>>>>>>>>> be build and host artifacts for the child jobs >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > Yes, we need to investigate that. >>>>>>>>>> > >>>>>>>>>> >> >>>>>>>>>> >> 4. remove some portion of the upstream jobs to lower the >>>>>>>>>> impact we have on 3rd party infrastructure. >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > I'm not sure I understand this one, maybe you can give an >>>>>>>>>> example of what could be removed? >>>>>>>>>> >>>>>>>>>> We need to re-evaulate our use of scenarios (e.g. we have two >>>>>>>>>> scenario010's both are non-voting). There's a reason we >>>>>>>>>> historically >>>>>>>>>> didn't want to add more jobs because of these types of resource >>>>>>>>>> constraints. I think we've added new jobs recently and likely >>>>>>>>>> need to >>>>>>>>>> reduce what we run. Additionally we might want to look into reducing >>>>>>>>>> what we run on stable branches as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Oh... removing jobs (I thought we would remove some steps of the jobs). >>>>>>>>>> Yes big +1, this should be a continuous goal when working on CI, and >>>>>>>>>> always evaluating what we need vs what we run now. >>>>>>>>>> >>>>>>>>>> We should look at: >>>>>>>>>> 1) services deployed in scenarios that aren't worth testing (e.g. >>>>>>>>>> deprecated or unused things) (and deprecate the unused things) >>>>>>>>>> 2) jobs themselves (I don't have any example beside scenario010 but >>>>>>>>>> I'm sure there are more). >>>>>>>>>> -- >>>>>>>>>> Emilien Macchi >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks Alex, Emilien >>>>>>>>>> >>>>>>>>>> +1 to reviewing the catalog and adjusting things on an ongoing basis. >>>>>>>>>> >>>>>>>>>> All.. it looks like the issues with docker.io were >>>>>>>>>> more of a flare up than a change in docker.io policy >>>>>>>>>> or infrastructure [2]. The flare up started on July 27 8am utc and >>>>>>>>>> ended on July 27 17:00 utc, see screenshots. >>>>>>>>> >>>>>>>>> The numbers of image prepare workers and its exponential fallback >>>>>>>>> intervals should be also adjusted. I've analysed the log snippet [0] for >>>>>>>>> the connection reset counts by workers versus the times the rate >>>>>>>>> limiting was triggered. See the details in the reported bug [1]. >>>>>>>>> >>>>>>>>> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >>>>>>>>> >>>>>>>>> Conn Reset Counts by a Worker PID: >>>>>>>>> 3 58412 >>>>>>>>> 2 58413 >>>>>>>>> 3 58415 >>>>>>>>> 3 58417 >>>>>>>>> >>>>>>>>> which seems too much of (workers*reconnects) and triggers rate limiting >>>>>>>>> immediately. >>>>>>>>> >>>>>>>>> [0] >>>>>>>>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>>>>>>>> >>>>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Bogdan Dobrelya, >>>>>>>>> Irc #bogdando >>>>>>>>> >>>>>>>> >>>>>>>> FYI.. >>>>>>>> >>>>>>>> The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. >>>>>>> >>>>>>> Working with the infra folks and we have identified the authorization >>>>>>> header as causing issues when we're rediected from docker.io to >>>>>>> cloudflare. I'll throw up a patch tomorrow to handle this case which >>>>>>> should improve our usage of the cache. It needs some testing against >>>>>>> other registries to ensure that we don't break authenticated fetching >>>>>>> of resources. >>>>>>> >>>>>> Thanks Alex! >>>>> >>>>> >>>>> >>>>> FYI.. we have been revisited by the container pull issue, "too many requests". >>>>> Alex has some fresh patches on it: https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 >>>>> >>>>> expect trouble in check and gate: >>>>> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 >>>>> >>> >> >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Irc #bogdando >> > -- Best regards, Bogdan Dobrelya, Irc #bogdando From bdobreli at redhat.com Wed Aug 19 14:34:34 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 19 Aug 2020 16:34:34 +0200 Subject: [tripleo][ci] container pulls failing In-Reply-To: <6ac5db1c-4750-de2d-627f-edc68e3da516@redhat.com> References: <75d5fd38-f0bb-01eb-54a4-bfc3f0c42474@redhat.com> <6ac5db1c-4750-de2d-627f-edc68e3da516@redhat.com> Message-ID: On 8/19/20 4:31 PM, Bogdan Dobrelya wrote: > On 8/19/20 3:55 PM, Alex Schultz wrote: >> On Wed, Aug 19, 2020 at 7:53 AM Bogdan Dobrelya >> wrote: >>> >>> On 8/19/20 3:23 PM, Alex Schultz wrote: >>>> On Wed, Aug 19, 2020 at 7:15 AM Luke Short wrote: >>>>> >>>>> Hey folks, >>>>> >>>>> All of the latest patches to address this have been merged in but >>>>> we are still seeing this error randomly in CI jobs that involve an >>>>> Undercloud or Standalone node. As far as I can tell, the error is >>>>> appearing less often than before but it is still present making >>>>> merging new patches difficult. I would be happy to help work >>>>> towards other possible solutions however I am unsure where to start >>>>> from here. Any help would be greatly appreciated. >>>>> >>>> >>>> I'm looking at this today but from what I can tell the problem is >>>> likely caused by a reduced anonymous query quota from docker.io and >>>> our usage of the upstream mirrors.  Because the mirrors essentially >>>> funnel all requests through a single IP we're hitting limits faster >>>> than if we didn't use the mirrors. Due to the nature of the requests, >>>> the metadata queries don't get cached due to the authorization header >>>> but are subject to the rate limiting.  Additionally we're querying the >>>> registry to determine which containers we need to update in CI because >>>> we limit our updates to a certain set of containers as part of the CI >>>> jobs. >>>> >>>> So there are likely a few different steps forward on this and we can >>>> do a few of these together. >>>> >>>> 1) stop using mirrors (not ideal but likely makes this go away). >>>> Alternatively switch stable branches off the mirrors due to a reduced >>>> number of executions and leave mirrors configured on master only (or >>>> vice versa). >>> >>> Also, the stable/(N-1) branch could use quay.io, while master keeps >>> using docker.io (assuming containers for that N-1 release will be hosted >>> there instead of the dockerhub) >>> >> >> quay has its own limits and likely will suffer from a similar problem. > > Right. But dropped numbers of total requests sent to each registry could > end up with less often rate limiting by either of two. > >> >>>> 2) reduce the number of jobs >>>> 3) stop querying the registry for the update filters (i'm looking into >>>> this today) and use the information in tripleo-common first. >>>> 4) build containers always instead of fetching from docker.io > > There may be a middle-ground solution. Building it only once for each > patchset executed in TripleO Zuul pipelines. Transient images, like [0], > that can have TTL and self-expire should be used for that purpose. > > [0] > https://idbs-engineering.com/containers/2019/08/27/auto-expiry-quayio-tags.html > > > That would require the zuul jobs with dependencies passing ansible > variables to each other, by the execution results. Can that be done? ...or even simpler than that, predictable names can be created for those transient images, like /_ > > Pretty much like we have it already set in TripleO for tox jobs as a > dependency for standalone/multinode jobs. But adding an extra step to > prepare such a transient pack of the container images (only to be used > for that patchset) and push it to a quay registry hosted elsewhere by > TripleO devops folks. > > Then the jobs that have that dependency met can use those transient > images via an ansible variable passed for the jobs. Auto expiration > solves the space/lifecycle requirements for the cloud that will be > hosting that registry. > >>>> >>>> Thanks, >>>> -Alex >>>> >>>> >>>> >>>>> Sincerely, >>>>>       Luke Short >>>>> >>>>> On Wed, Aug 5, 2020 at 12:26 PM Wesley Hayutin >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jul 29, 2020 at 4:48 PM Wesley Hayutin >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz >>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >>>>>>>>>>> >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>       On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >>>>>>>>>>> >>>>>>>>>>       > wrote: >>>>>>>>>>> >>>>>>>>>>>           On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >>>>>>>>>>>           > >>>>>>>>>>> wrote: >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >>>>>>>>>>>           > >>>>>>>>>>> wrote: >>>>>>>>>>>            >> >>>>>>>>>>>            >> FYI... >>>>>>>>>>>            >> >>>>>>>>>>>            >> If you find your jobs are failing with an error >>>>>>>>>>> similar to >>>>>>>>>>>           [1], you have been rate limited by docker.io >>>>>>>>>>> >>>>>>>>>>>           via the upstream mirror system and have hit [2]. >>>>>>>>>>> I've been >>>>>>>>>>>           discussing the issue w/ upstream infra, rdo-infra >>>>>>>>>>> and a few CI >>>>>>>>>>>           engineers. >>>>>>>>>>>            >> >>>>>>>>>>>            >> There are a few ways to mitigate the issue >>>>>>>>>>> however I don't >>>>>>>>>>>           see any of the options being completed very quickly >>>>>>>>>>> so I'm >>>>>>>>>>>           asking for your patience while this issue is >>>>>>>>>>> socialized and >>>>>>>>>>>           resolved. >>>>>>>>>>>            >> >>>>>>>>>>>            >> For full transparency we're considering the >>>>>>>>>>> following options. >>>>>>>>>>>            >> >>>>>>>>>>>            >> 1. move off of docker.io to >>>>>>>>>>> quay.io >>>>>>>>>>>           >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > quay.io also has API rate limit: >>>>>>>>>>>            > https://docs.quay.io/issues/429.html >>>>>>>>>>>            > >>>>>>>>>>>            > Now I'm not sure about how many requests per >>>>>>>>>>> seconds one can >>>>>>>>>>>           do vs the other but this would need to be checked >>>>>>>>>>> with the quay >>>>>>>>>>>           team before changing anything. >>>>>>>>>>>            > Also quay.io had its big >>>>>>>>>>> downtimes as well, >>>>>>>>>>>           SLA needs to be considered. >>>>>>>>>>>            > >>>>>>>>>>>            >> 2. local container builds for each job in >>>>>>>>>>> master, possibly >>>>>>>>>>>           ussuri >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > Not convinced. >>>>>>>>>>>            > You can look at CI logs: >>>>>>>>>>>            > - pulling / updating / pushing container images >>>>>>>>>>> from >>>>>>>>>>>           docker.io to local registry >>>>>>>>>>> takes ~10 min on >>>>>>>>>>>           standalone (OVH) >>>>>>>>>>>            > - building containers from scratch with updated >>>>>>>>>>> repos and >>>>>>>>>>>           pushing them to local registry takes ~29 min on >>>>>>>>>>> standalone (OVH). >>>>>>>>>>>            > >>>>>>>>>>>            >> >>>>>>>>>>>            >> 3. parent child jobs upstream where rpms and >>>>>>>>>>> containers will >>>>>>>>>>>           be build and host artifacts for the child jobs >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > Yes, we need to investigate that. >>>>>>>>>>>            > >>>>>>>>>>>            >> >>>>>>>>>>>            >> 4. remove some portion of the upstream jobs to >>>>>>>>>>> lower the >>>>>>>>>>>           impact we have on 3rd party infrastructure. >>>>>>>>>>>            > >>>>>>>>>>>            > >>>>>>>>>>>            > I'm not sure I understand this one, maybe you >>>>>>>>>>> can give an >>>>>>>>>>>           example of what could be removed? >>>>>>>>>>> >>>>>>>>>>>           We need to re-evaulate our use of scenarios (e.g. >>>>>>>>>>> we have two >>>>>>>>>>>           scenario010's both are non-voting).  There's a >>>>>>>>>>> reason we >>>>>>>>>>>           historically >>>>>>>>>>>           didn't want to add more jobs because of these types >>>>>>>>>>> of resource >>>>>>>>>>>           constraints.  I think we've added new jobs recently >>>>>>>>>>> and likely >>>>>>>>>>>           need to >>>>>>>>>>>           reduce what we run. Additionally we might want to >>>>>>>>>>> look into reducing >>>>>>>>>>>           what we run on stable branches as well. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>       Oh... removing jobs (I thought we would remove some >>>>>>>>>>> steps of the jobs). >>>>>>>>>>>       Yes big +1, this should be a continuous goal when >>>>>>>>>>> working on CI, and >>>>>>>>>>>       always evaluating what we need vs what we run now. >>>>>>>>>>> >>>>>>>>>>>       We should look at: >>>>>>>>>>>       1) services deployed in scenarios that aren't worth >>>>>>>>>>> testing (e.g. >>>>>>>>>>>       deprecated or unused things) (and deprecate the unused >>>>>>>>>>> things) >>>>>>>>>>>       2) jobs themselves (I don't have any example beside >>>>>>>>>>> scenario010 but >>>>>>>>>>>       I'm sure there are more). >>>>>>>>>>>       -- >>>>>>>>>>>       Emilien Macchi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks Alex, Emilien >>>>>>>>>>> >>>>>>>>>>> +1 to reviewing the catalog and adjusting things on an >>>>>>>>>>> ongoing basis. >>>>>>>>>>> >>>>>>>>>>> All.. it looks like the issues with docker.io >>>>>>>>>>> were >>>>>>>>>>> more of a flare up than a change in docker.io >>>>>>>>>>> policy >>>>>>>>>>> or infrastructure [2].  The flare up started on July 27 8am >>>>>>>>>>> utc and >>>>>>>>>>> ended on July 27 17:00 utc, see screenshots. >>>>>>>>>> >>>>>>>>>> The numbers of image prepare workers and its exponential fallback >>>>>>>>>> intervals should be also adjusted. I've analysed the log >>>>>>>>>> snippet [0] for >>>>>>>>>> the connection reset counts by workers versus the times the rate >>>>>>>>>> limiting was triggered. See the details in the reported bug [1]. >>>>>>>>>> >>>>>>>>>> tl;dr -- for an example 5 sec interval 03:55:31,379 - >>>>>>>>>> 03:55:36,110: >>>>>>>>>> >>>>>>>>>> Conn Reset Counts by a Worker PID: >>>>>>>>>>          3 58412 >>>>>>>>>>          2 58413 >>>>>>>>>>          3 58415 >>>>>>>>>>          3 58417 >>>>>>>>>> >>>>>>>>>> which seems too much of (workers*reconnects) and triggers rate >>>>>>>>>> limiting >>>>>>>>>> immediately. >>>>>>>>>> >>>>>>>>>> [0] >>>>>>>>>> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards, >>>>>>>>>> Bogdan Dobrelya, >>>>>>>>>> Irc #bogdando >>>>>>>>>> >>>>>>>>> >>>>>>>>> FYI.. >>>>>>>>> >>>>>>>>> The issue w/ "too many requests" is back.  Expect delays and >>>>>>>>> failures in attempting to merge your patches upstream across >>>>>>>>> all branches.   The issue is being tracked as a critical issue. >>>>>>>> >>>>>>>> Working with the infra folks and we have identified the >>>>>>>> authorization >>>>>>>> header as causing issues when we're rediected from docker.io to >>>>>>>> cloudflare. I'll throw up a patch tomorrow to handle this case >>>>>>>> which >>>>>>>> should improve our usage of the cache.  It needs some testing >>>>>>>> against >>>>>>>> other registries to ensure that we don't break authenticated >>>>>>>> fetching >>>>>>>> of resources. >>>>>>>> >>>>>>> Thanks Alex! >>>>>> >>>>>> >>>>>> >>>>>> FYI.. we have been revisited by the container pull issue, "too >>>>>> many requests". >>>>>> Alex has some fresh patches on it: >>>>>> https://review.opendev.org/#/q/status:open+project:openstack/tripleo-common+topic:bug/1889122 >>>>>> >>>>>> >>>>>> expect trouble in check and gate: >>>>>> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22429%20Client%20Error%3A%20Too%20Many%20Requests%20for%20url%3A%5C%22%20AND%20voting%3A1 >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Bogdan Dobrelya, >>> Irc #bogdando >>> >> > > -- Best regards, Bogdan Dobrelya, Irc #bogdando From tobias.urdin at binero.com Wed Aug 19 14:38:42 2020 From: tobias.urdin at binero.com (Tobias Urdin) Date: Wed, 19 Aug 2020 14:38:42 +0000 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: , Message-ID: <1597847922905.32607@binero.com> ?Big +1 from an outsider :)) Best regards Tobias ________________________________ From: Rabi Mishra Sent: Wednesday, August 19, 2020 3:37 PM To: Emilien Macchi Cc: openstack-discuss Subject: Re: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo +1 On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi > wrote: Hi people, If you don't know Takashi yet, he has been involved in the Puppet OpenStack project and helped *a lot* in its maintenance (and by maintenance I mean not-funny-work). When our community was getting smaller and smaller, he joined us and our review velicity went back to eleven. He became a core maintainer very quickly and we're glad to have him onboard. He's also been involved in taking care of puppet-tripleo for a few months and I believe he has more than enough knowledge on the module to provide core reviews and be part of the core maintainer group. I also noticed his amount of contribution (bug fixes, improvements, reviews, etc) in other TripleO repos and I'm confident he'll make his road to be core in TripleO at some point. For now I would like him to propose him to be core in puppet-tripleo. As usual, any feedback is welcome but in the meantime I want to thank Takashi for his work in TripleO and we're super happy to have new contributors! Thanks, -- Emilien Macchi -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Aug 19 14:52:17 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 19 Aug 2020 09:52:17 -0500 Subject: [simplification] Making ask.openstack.org read-only In-Reply-To: References: <20200818235247.GA341779@fedora19.localdomain> <20200819000359.mhz43jvop5vtcgct@yuggoth.org> Message-ID: <1740734d7ec.ed463568528549.2297830654288026424@ghanshyammann.com> ---- On Tue, 18 Aug 2020 19:35:05 -0500 Michael Johnson wrote ---- > Yes! ask.openstack.org is no fun to attempt to be helpful on (see > e-mail notification issues, etc.). > +1 on making it RO and redirect users to StackOverflow or ML(fast response).. > I would like to ask that we put together some sort of guide and/or > guidence for how to use stack overflow efficiently for OpenStack > questions. I.e. some well known or defined tags that we recommend > people use when asking questions. This would be similar to the tags we > use for the openstack discuss list. > > I see that there is already a trend for "openstack-nova" > "openstack-horizon", etc. This works for me. In FC SIG, we check a set of tags for new contributors in ask.o.o [1] which we can switch to do in StackOverflow. Similarly, we can start monitoring the popular tags for project/area-specific. [1] https://wiki.openstack.org/wiki/First_Contact_SIG#Biweekly_Homework -gmann > > This way we can setup notifications for these tags and be much more > efficient at getting people answers. > > Thanks Thierry for moving this forward! > > Michael > > On Tue, Aug 18, 2020 at 5:10 PM Jeremy Stanley wrote: > > > > On 2020-08-19 09:52:47 +1000 (+1000), Ian Wienand wrote: > > [...] > > > *If* we were to restore it now, it looks like 0.11 branch comes with > > > an upstream Dockerfile [1]; there's lots of examples now in > > > system-config of similar container-based production sites and this > > > could fit in. > > > > > > This makes it significantly easier than trying to build up everything > > > it requires from scratch, and if upstream keep their container > > > compatible (a big if...) theoretically less work to keep updated. > > [...] > > > > Which also brings up another point: right now we're running it on > > Ubuntu Xenial (16.04 LTS) which is scheduled to reach EOL early next > > year, and the tooling we're using to deploy it isn't going to work > > on newer Ubuntu releases. Even keeping it up in a read-only state is > > timebound to how long we can safely keep its server online. If we > > switch ask.openstack.org to read-only now, I would still plan to > > turn it off entirely on or before April 1, 2021. > > -- > > Jeremy Stanley > > From fungi at yuggoth.org Wed Aug 19 14:53:52 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 19 Aug 2020 14:53:52 +0000 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: <20200819145352.ezxr6kvvpsq3tgui@yuggoth.org> On 2020-08-19 15:40:08 +0200 (+0200), Cédric Jeanneret wrote: > On 8/19/20 3:23 PM, Alex Schultz wrote: [...] > > 1) stop using mirrors (not ideal but likely makes this go away). > > Alternatively switch stable branches off the mirrors due to a reduced > > number of executions and leave mirrors configured on master only (or > > vice versa). > > might be good, but it might lead to some other issues - docker might > want to rate-limit on container owner. I wouldn't be surprised if they > go that way in the future. Could be OK as a first "unlocking step". [...] Be aware that there is another side effect: right now the images are being served from a cache within the same environment as the test nodes, and instead your jobs will begin fetching them over the Internet. This may mean longer average job run time, and a higher percentage of download failures due to network hiccups (whether these will be of a greater frequency than the API rate limit blocking, it's hard to guess). It also necessarily means significantly more bandwidth utilization for our resource donors, particularly as TripleO consumes far more job resources than any other project already. I wonder if there's a middle ground: finding a way to use the cache for fetching images, but connecting straight to Dockerhub when you're querying metadata? It sounds like the metadata requests represent a majority of the actual Dockerhub API calls anyway, and can't be cached regardless. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aschultz at redhat.com Wed Aug 19 15:14:27 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 19 Aug 2020 09:14:27 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: <20200819145352.ezxr6kvvpsq3tgui@yuggoth.org> References: <20200819145352.ezxr6kvvpsq3tgui@yuggoth.org> Message-ID: On Wed, Aug 19, 2020 at 8:59 AM Jeremy Stanley wrote: > > On 2020-08-19 15:40:08 +0200 (+0200), Cédric Jeanneret wrote: > > On 8/19/20 3:23 PM, Alex Schultz wrote: > [...] > > > 1) stop using mirrors (not ideal but likely makes this go away). > > > Alternatively switch stable branches off the mirrors due to a reduced > > > number of executions and leave mirrors configured on master only (or > > > vice versa). > > > > might be good, but it might lead to some other issues - docker might > > want to rate-limit on container owner. I wouldn't be surprised if they > > go that way in the future. Could be OK as a first "unlocking step". > [...] > > Be aware that there is another side effect: right now the images are > being served from a cache within the same environment as the test > nodes, and instead your jobs will begin fetching them over the > Internet. This may mean longer average job run time, and a higher > percentage of download failures due to network hiccups (whether > these will be of a greater frequency than the API rate limit > blocking, it's hard to guess). It also necessarily means > significantly more bandwidth utilization for our resource donors, > particularly as TripleO consumes far more job resources than any > other project already. > Yea I know so we're trying to find a solution that doesn't make it worse. It would be great if we could have any visibility into the cache hit ratio/requests going through these mirrors to know if we have changes that are improving things or making it worse. > I wonder if there's a middle ground: finding a way to use the cache > for fetching images, but connecting straight to Dockerhub when > you're querying metadata? It sounds like the metadata requests > represent a majority of the actual Dockerhub API calls anyway, and > can't be cached regardless. Maybe, but at the moment i'm working on not even doing the requests at all which would be better. Next i'll look into that but the mirror config is handled before we even start requesting things > -- > Jeremy Stanley From fungi at yuggoth.org Wed Aug 19 15:45:40 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 19 Aug 2020 15:45:40 +0000 Subject: [tripleo][ci]i[infra] container pulls failing In-Reply-To: References: <20200819145352.ezxr6kvvpsq3tgui@yuggoth.org> Message-ID: <20200819154540.es5nru4xmzj637rx@yuggoth.org> On 2020-08-19 09:14:27 -0600 (-0600), Alex Schultz wrote: [...] > It would be great if we could have any visibility into the cache > hit ratio/requests going through these mirrors to know if we have > changes that are improving things or making it worse. [...] Normally we avoid publishing raw Web server logs to protect the privacy of our users, but in this case we might make an exception because the mirrors are only intended for use by our public Zuul jobs and Nodepool image builds. It's worth bringing up with the rest of the team, for sure. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From openstack at nemebean.com Wed Aug 19 16:27:01 2020 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 19 Aug 2020 11:27:01 -0500 Subject: [neutron] Disable dhcp drop rule In-Reply-To: <20200819133616.Horde.zhXC_mhe4RdzjbP4Shl1M45@webmail.nde.ag> References: <20200819133616.Horde.zhXC_mhe4RdzjbP4Shl1M45@webmail.nde.ag> Message-ID: <4ea4eb17-0373-e1ab-6f45-c35cb67723e0@nemebean.com> On 8/19/20 8:36 AM, Eugen Block wrote: > Hi *, > > we recently upgraded our Ocata Cloud to Train and also switched from > linuxbridge to openvswitch. > > One of our instances within the cloud works as DHCP server and to make > that work we had to comment the respective part in this file on the > compute node the instance was running on: > > /usr/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py > > > Now we tried the same in > > /usr/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py > > /usr/lib/python3.6/site-packages/neutron/agent/linux/iptables_firewall.py > > but restarting openstack-neutron-openvswitch-agent.service didn't drop > that rule, the DHCP reply didn't get through. To continue with our work > we just dropped it manually, so we get by, but since there have been a > couple of years between Ocata and Train, is there any smoother or better > way to achieve this? This seems to be a reoccuring request but I > couldn't find any updates on this topic. Maybe someone here can shed > some light? Is there more to change than those two files I mentioned? You might try disabling port-security on the instance's port. That's what we use in OVB to allow a DHCP server in an instance now. neutron port-update [port-id] --port_security_enabled=False That will drop all port security for that instance, not just the DHCP rule, but on the other hand it leaves the DHCP rule in place for any instances you don't want running DHCP servers. > > Any pointers are highly appreciated! > > Best regards, > Eugen > > From eblock at nde.ag Wed Aug 19 16:42:11 2020 From: eblock at nde.ag (Eugen Block) Date: Wed, 19 Aug 2020 16:42:11 +0000 Subject: [neutron] Disable dhcp drop rule In-Reply-To: <4ea4eb17-0373-e1ab-6f45-c35cb67723e0@nemebean.com> References: <20200819133616.Horde.zhXC_mhe4RdzjbP4Shl1M45@webmail.nde.ag> <4ea4eb17-0373-e1ab-6f45-c35cb67723e0@nemebean.com> Message-ID: <20200819164211.Horde.jx_dhmZz16BL7k9bIumarOA@webmail.nde.ag> That sounds promising, thank you! I had noticed that option but didn’t have a chance to look closer into it. I’ll try that tomorrow. Thanks for the tip! Zitat von Ben Nemec : > On 8/19/20 8:36 AM, Eugen Block wrote: >> Hi *, >> >> we recently upgraded our Ocata Cloud to Train and also switched >> from linuxbridge to openvswitch. >> >> One of our instances within the cloud works as DHCP server and to >> make that work we had to comment the respective part in this file >> on the compute node the instance was running on: >> >> /usr/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py >> >> >> Now we tried the same in >> >> /usr/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py >> /usr/lib/python3.6/site-packages/neutron/agent/linux/iptables_firewall.py >> >> but restarting openstack-neutron-openvswitch-agent.service didn't >> drop that rule, the DHCP reply didn't get through. To continue with >> our work we just dropped it manually, so we get by, but since there >> have been a couple of years between Ocata and Train, is there any >> smoother or better way to achieve this? This seems to be a >> reoccuring request but I couldn't find any updates on this topic. >> Maybe someone here can shed some light? Is there more to change >> than those two files I mentioned? > > You might try disabling port-security on the instance's port. That's > what we use in OVB to allow a DHCP server in an instance now. > > neutron port-update [port-id] --port_security_enabled=False > > That will drop all port security for that instance, not just the > DHCP rule, but on the other hand it leaves the DHCP rule in place > for any instances you don't want running DHCP servers. > >> >> Any pointers are highly appreciated! >> >> Best regards, >> Eugen >> >> From rosmaita.fossdev at gmail.com Wed Aug 19 16:49:53 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 19 Aug 2020 12:49:53 -0400 Subject: [cinder][ops] "Berlin" 2020 Virtual Forum - Cinder brainstorming Message-ID: <8f233106-a48a-af66-6aa4-42316fd3a669@gmail.com> This message is aimed at anyone with an interest in the OpenStack Block Storage Service, whether as an operator, a user, or a developer. Like all the other teams, Cinder would like to get feedback from operators and users about the current state of the software, get some ideas about what should be in the next release, and have some strategic discussion about The Future. So if you have some ideas you'd like to be considered, feel free to propose a topic: https://etherpad.opendev.org/p/2020-Wallaby-cinder-brainstorming You only need to add a sentence or two describing your topic and it doesn't have to be very polished, so if you have an idea, just go to the etherpad and slap it down now while you're thinking about it. The deadline for proposals to the Foundation is 14 September, so if you could get your idea down on the etherpad before the Cinder weekly meeting on Wednesday 9 September 14:00 UTC, that will give the Cinder team time to look them over. thanks! brian From amuller at redhat.com Wed Aug 19 16:50:04 2020 From: amuller at redhat.com (Assaf Muller) Date: Wed, 19 Aug 2020 12:50:04 -0400 Subject: [neutron][ops] API for viewing HA router states In-Reply-To: References: <6613245.ccrTHCtBl7@antares> Message-ID: On Tue, Aug 18, 2020 at 10:30 PM Mohammed Naser wrote: > > On Tue, Aug 18, 2020 at 10:53 AM Assaf Muller wrote: > > > > On Tue, Aug 18, 2020 at 8:12 AM Jonas Schäfer > > wrote: > > > > > > Hi Mohammed and all, > > > > > > On Montag, 17. August 2020 14:01:55 CEST Mohammed Naser wrote: > > > > Over the past few days, we were troubleshooting an issue that ended up > > > > having a root cause where keepalived has somehow ended up active in > > > > two different L3 agents. We've yet to find the root cause of how this > > > > happened but removing it and adding it resolved the issue for us. > > > > > > We’ve also seen that behaviour occasionally. The root cause is also unclear > > > for us (so we would’ve love to hear about that). > > > > Insert shameless plug for the Neutron OVN backend. One of it's > > advantages is that it's L3 HA architecture is cleaner and more > > scalable (this is coming from the dude that wrote the L3 HA code we're > > all suffering from =D). The ML2/OVS L3 HA architecture has it's issues > > - I've seen it work at 100's of customer sites at scale, so I don't > > want to knock it too much, but just a day ago I got an internal > > customer ticket about keepalived falling over on a particular router > > that has 200 floating IPs. It works but it's not perfect. I'm sure the > > OVN implementation isn't either but it's simply cleaner and has less > > moving parts. It uses BFD to monitor the tunnel endpoints, so failover > > is faster too. Plus, it doesn't use keepalived. > > > > OVN is something we're looking at and we're very excited about, > unfortunately, there seems to be a bunch of gaps in documentation Can you elaborate? If you can write down a list of gaps we can address that. > right now as well as a lot of the migration scripts to OVN are > TripleO-y. > > So it'll take time to get us there, but yes, OVN simplifies this greatly > > > > We have anecdotal evidence > > > that a rabbitmq failure was involved, although that makes no sense to me > > > personally. Other causes may be incorrectly cleaned-up namespaces (for > > > example, when you kill or hard-restart the l3 agent, the namespaces will stay > > > around, possibly with the IP address assigned; the keepalived on the other l3 > > > agents will not see the VRRP advertisments anymore and will ALSO assign the IP > > > address. This will also be rectified by a restart always and may require > > > manual namespace cleanup with a tool, a node reboot or an agent disable/enable > > > cycle.). > > > > > > > As we work on improving our monitoring, we wanted to implement > > > > something that gets us the info of # of active routers to check if > > > > there's a router that has >1 active L3 agent but it's hard because > > > > hitting the /l3-agents endpoint on _every_ single router hurts a lot > > > > on performance. > > > > > > > > Is there something else that we can watch which might be more > > > > productive? FYI -- this all goes in the open and will end up inside > > > > the openstack-exporter: > > > > https://github.com/openstack-exporter/openstack-exporter and the Helm > > > > charts will end up with the alerts: > > > > https://github.com/openstack-exporter/helm-charts > > > > > > While I don’t think it fits in your openstack-exporter design, we are > > > currently using the attached script (which we also hereby publish under the > > > terms of the Apache 2.0 license [1]). (Sorry, I lack the time to cleanly > > > publish it somewhere right now.) > > > > > > It checks the state files maintained by the L3 agent conglomerate and exports > > > metrics about the master-ness of the routers as prometheus metrics. > > > > > > Note that this is slightly dangerous since the router IDs are high-cardinality > > > and using that as a label value in Prometheus is discouraged; you may not want > > > to do this in a public cloud setting. > > > > > > Either way: This allows us to alert on routers where there is not exactly one > > > master state. Downside is that this requires the thing to run locally on the > > > l3 agent nodes. Upside is that it is very efficient, and will also show the > > > master state in some cases where the router was not cleaned up properly (e.g. > > > because the l3 agent and its keepaliveds were killed). > > > kind regards, > > > Jonas > > > > > > [1]: http://www.apache.org/licenses/LICENSE-2.0 > > > -- > > > Jonas Schäfer > > > DevOps Engineer > > > > > > Cloud&Heat Technologies GmbH > > > Königsbrücker Straße 96 | 01099 Dresden > > > +49 351 479 367 37 > > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > > > New Service: > > > Managed Kubernetes designed for AI & ML > > > https://managed-kubernetes.cloudandheat.com/ > > > > > > Commercial Register: District Court Dresden > > > Register Number: HRB 30549 > > > VAT ID No.: DE281093504 > > > Managing Director: Nicolas Röhrs > > > Authorized signatory: Dr. Marius Feldmann > > > Authorized signatory: Kristina Rübenkamp > > > > > > > -- > Mohammed Naser > VEXXHOST, Inc. > From ildiko.vancsa at gmail.com Wed Aug 19 17:12:26 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Wed, 19 Aug 2020 19:12:26 +0200 Subject: [upstream-institute] Virtual training mentor sign-up and planning In-Reply-To: References: Message-ID: Hi, It is a friendly reminder to please sign up on the wiki[1] if you are interested in participating in the virtual version of the Upstream Institute training as a mentor. We will start planning soon to ensure that we have the format and the materials adjusted to the new circumstances. Please let me know if you have any questions. Thanks. Ildikó [1] https://wiki.openstack.org/wiki/OpenStack_Upstream_Institute_Occasions#Virtual_Training.2C_2020 > On Aug 10, 2020, at 14:31, Ildiko Vancsa wrote: > > Hi mentors, > > I’m reaching out to you as the next Open Infrastructure Summit is approaching quickly so it is time to start planning for the next OpenStack Upstream Institute. > > As the next event will be virtual we will need to re-think the training format and experience to make sure our audience gets the most out of it. > > I created a new entry on our training occasions wiki page here: https://wiki.openstack.org/wiki/OpenStack_Upstream_Institute_Occasions#Virtual_Training.2C_2020 > > Please __sign up on the wiki__ if you would like to participate in the preparations and running the virtual training. > > As it is still vacation season I think we can target the last week of August or first week of September to have the first prep meeting and can collect ideas here or discuss them on the #openstack-upstream-institute IRC channel on Freenode in the meantime. > > Please let me know if you have any questions or need any help with signing up on the wiki. > > Thanks and Best Regards, > Ildikó > > From jasowang at redhat.com Wed Aug 19 02:38:13 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 10:38:13 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818091628.GC20215@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> Message-ID: <5aea4ae6-e8c8-1120-453d-20a78cee6b20@redhat.com> On 2020/8/18 下午5:16, Daniel P. Berrangé wrote: > Your mail came through as HTML-only so all the quoting and attribution > is mangled / lost now :-( My bad, sorry. > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: >> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: >> >> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: >> >> On 2020/8/14 下午1:16, Yan Zhao wrote: >> >> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: >> >> On 2020/8/10 下午3:46, Yan Zhao wrote: >> we actually can also retrieve the same information through sysfs, .e.g >> >> |- [path to device] >> |--- migration >> | |--- self >> | | |---device_api >> | | |---mdev_type >> | | |---software_version >> | | |---device_id >> | | |---aggregator >> | |--- compatible >> | | |---device_api >> | | |---mdev_type >> | | |---software_version >> | | |---device_id >> | | |---aggregator >> >> >> Yes but: >> >> - You need one file per attribute (one syscall for one attribute) >> - Attribute is coupled with kobject >> >> All of above seems unnecessary. >> >> Another point, as we discussed in another thread, it's really hard to make >> sure the above API work for all types of devices and frameworks. So having a >> vendor specific API looks much better. >> >> From the POV of userspace mgmt apps doing device compat checking / migration, >> we certainly do NOT want to use different vendor specific APIs. We want to >> have an API that can be used / controlled in a standard manner across vendors. >> >> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a >> long debate on sysfs vs devlink). So if we go with sysfs, at least two >> APIs needs to be supported ... > NB, I was not questioning devlink vs sysfs directly. If devlink is related > to netlink, I can't say I'm enthusiastic as IMKE sysfs is easier to deal > with. I don't know enough about devlink to have much of an opinion though. > The key point was that I don't want the userspace APIs we need to deal with > to be vendor specific. > > What I care about is that we have a *standard* userspace API for performing > device compatibility checking / state migration, for use by QEMU/libvirt/ > OpenStack, such that we can write code without countless vendor specific > code paths. > > If there is vendor specific stuff on the side, that's fine as we can ignore > that, but the core functionality for device compat / migration needs to be > standardized. Ok, I agree with you. Thanks > > Regards, > Daniel From jasowang at redhat.com Wed Aug 19 02:45:57 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 10:45:57 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> Message-ID: <934c8d2a-a34e-6c68-0e53-5de2a8f49d19@redhat.com> On 2020/8/18 下午5:32, Parav Pandit wrote: > Hi Jason, > > From: Jason Wang > Sent: Tuesday, August 18, 2020 2:32 PM > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > On 2020/8/14 下午1:16, Yan Zhao wrote: > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > On 2020/8/10 下午3:46, Yan Zhao wrote: > driver is it handled by? > It looks that the devlink is for network device specific, and in > devlink.h, it says > include/uapi/linux/devlink.h - Network physical device Netlink > interface, > Actually not, I think there used to have some discussion last year and the > conclusion is to remove this comment. > > [...] > >> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a long debate on sysfs vs devlink). So if we go with sysfs, at least two APIs needs to be supported ... > We had internal discussion and proposal on this topic. > I wanted Eli Cohen to be back from vacation on Wed 8/19, but since this is active discussion right now, I will share the thoughts anyway. > > Here are the initial round of thoughts and proposal. > > User requirements: > --------------------------- > 1. User might want to create one or more vdpa devices per PCI PF/VF/SF. > 2. User might want to create one or more vdpa devices of type net/blk or other type. > 3. User needs to look and dump at the health of the queues for debug purpose. > 4. During vdpa net device creation time, user may have to provide a MAC address and/or VLAN. > 5. User should be able to set/query some of the attributes for debug/compatibility check > 6. When user wants to create vdpa device, it needs to know which device supports creation. > 7. User should be able to see the queue statistics of doorbells, wqes etc regardless of class type Note that wqes is probably not something common in all of the vendors. > > To address above requirements, there is a need of vendor agnostic tool, so that user can create/config/delete vdpa device(s) regardless of the vendor. > > Hence, > We should have a tool that lets user do it. > > Examples: > ------------- > (a) List parent devices which supports creating vdpa devices. > It also shows which class types supported by this parent device. > In below command two parent devices support vdpa device creation. > First is PCI VF whose bdf is 03.00:5. > Second is PCI SF whose name is mlx5_sf.1 > > $ vdpa list pd What did "pd" mean? > pci/0000:03.00:5 > class_supports > net vdpa > virtbus/mlx5_sf.1 So creating mlx5_sf.1 is the charge of devlink? > class_supports > net > > (b) Now add a vdpa device and show the device. > $ vdpa dev add pci/0000:03.00:5 type net So if you want to create devices types other than vdpa on pci/0000:03.00:5 it needs some synchronization with devlink? > $ vdpa dev show > vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues 4 > > (c) vdpa dev show features vdpa0 > iommu platform > version 1 > > (d) dump vdpa statistics > $ vdpa dev stats show vdpa0 > kickdoorbells 10 > wqes 100 > > (e) Now delete a vdpa device previously created. > $ vdpa dev del vdpa0 > > Design overview: > ----------------------- > 1. Above example tool runs over netlink socket interface. > 2. This enables users to return meaningful error strings in addition to code so that user can be more informed. > Often this is missing in ioctl()/configfs/sysfs interfaces. > 3. This tool over netlink enables syscaller tests to be more usable like other subsystems to keep kernel robust > 4. This provides vendor agnostic view of all vdpa capable parent and vdpa devices. > > 5. Each driver which supports vdpa device creation, registers the parent device along with supported classes. > > FAQs: > -------- > 1. Why not using devlink? > Ans: Because as vdpa echo system grows, devlink will fall short of extending vdpa specific params, attributes, stats. This should be fine but it's still not clear to me the difference between a vdpa netlink and a vdpa object in devlink. Thanks > > 2. Why not use sysfs? > Ans: > (a) Because running syscaller infrastructure can run well over netlink sockets like it runs for several subsystem. > (b) it lacks the ability to return error messages. Doing via kernel log is just doesn't work. > (c) Why not using some ioctl()? It will reinvent the wheel of netlink that has TLV formats for several attributes. > > 3. Why not configs? > It follows same limitation as that of sysfs. > > Low level design and driver APIS: > -------------------------------------------- > Will post once we discuss this further. From jasowang at redhat.com Wed Aug 19 02:54:07 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 10:54:07 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818113652.5d81a392.cohuck@redhat.com> References: <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> Message-ID: On 2020/8/18 下午5:36, Cornelia Huck wrote: > On Tue, 18 Aug 2020 10:16:28 +0100 > Daniel P. Berrangé wrote: > >> On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: >>> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: >>> >>> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: >>> >>> On 2020/8/14 下午1:16, Yan Zhao wrote: >>> >>> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: >>> >>> On 2020/8/10 下午3:46, Yan Zhao wrote: >>> we actually can also retrieve the same information through sysfs, .e.g >>> >>> |- [path to device] >>> |--- migration >>> | |--- self >>> | | |---device_api >>> | | |---mdev_type >>> | | |---software_version >>> | | |---device_id >>> | | |---aggregator >>> | |--- compatible >>> | | |---device_api >>> | | |---mdev_type >>> | | |---software_version >>> | | |---device_id >>> | | |---aggregator >>> >>> >>> Yes but: >>> >>> - You need one file per attribute (one syscall for one attribute) >>> - Attribute is coupled with kobject > Is that really that bad? You have the device with an embedded kobject > anyway, and you can just put things into an attribute group? Yes, but all of this could be done via devlink(netlink) as well with low overhead. > > [Also, I think that self/compatible split in the example makes things > needlessly complex. Shouldn't semantic versioning and matching already > cover nearly everything? That's my question as well. E.g for virtio, versioning may not even work, some of features are negotiated independently: Source features: A, B, C Dest features: A, B, C, E We just need to make sure the dest features is a superset of source then all set. > I would expect very few cases that are more > complex than that. Maybe the aggregation stuff, but I don't think we > need that self/compatible split for that, either.] > >>> All of above seems unnecessary. >>> >>> Another point, as we discussed in another thread, it's really hard to make >>> sure the above API work for all types of devices and frameworks. So having a >>> vendor specific API looks much better. >>> >>> From the POV of userspace mgmt apps doing device compat checking / migration, >>> we certainly do NOT want to use different vendor specific APIs. We want to >>> have an API that can be used / controlled in a standard manner across vendors. >>> >>> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a >>> long debate on sysfs vs devlink). So if we go with sysfs, at least two >>> APIs needs to be supported ... >> NB, I was not questioning devlink vs sysfs directly. If devlink is related >> to netlink, I can't say I'm enthusiastic as IMKE sysfs is easier to deal >> with. I don't know enough about devlink to have much of an opinion though. >> The key point was that I don't want the userspace APIs we need to deal with >> to be vendor specific. > From what I've seen of devlink, it seems quite nice; but I understand > why sysfs might be easier to deal with (especially as there's likely > already a lot of code using it.) > > I understand that some users would like devlink because it is already > widely used for network drivers (and some others), but I don't think > the majority of devices used with vfio are network (although certainly > a lot of them are.) Note that though devlink could be popular only in network devices, netlink is widely used by a lot of subsystesm (e.g SCSI). Thanks > >> What I care about is that we have a *standard* userspace API for performing >> device compatibility checking / state migration, for use by QEMU/libvirt/ >> OpenStack, such that we can write code without countless vendor specific >> code paths. >> >> If there is vendor specific stuff on the side, that's fine as we can ignore >> that, but the core functionality for device compat / migration needs to be >> standardized. > To summarize: > - choose one of sysfs or devlink > - have a common interface, with a standardized way to add > vendor-specific attributes > ? From yan.y.zhao at intel.com Wed Aug 19 03:30:35 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 19 Aug 2020 11:30:35 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> Message-ID: <20200819033035.GA21172@joy-OptiPlex-7040> On Tue, Aug 18, 2020 at 09:39:24AM +0000, Parav Pandit wrote: > Hi Cornelia, > > > From: Cornelia Huck > > Sent: Tuesday, August 18, 2020 3:07 PM > > To: Daniel P. Berrangé > > Cc: Jason Wang ; Yan Zhao > > ; kvm at vger.kernel.org; libvir-list at redhat.com; > > qemu-devel at nongnu.org; Kirti Wankhede ; > > eauger at redhat.com; xin-ran.wang at intel.com; corbet at lwn.net; openstack- > > discuss at lists.openstack.org; shaohe.feng at intel.com; kevin.tian at intel.com; > > Parav Pandit ; jian-feng.ding at intel.com; > > dgilbert at redhat.com; zhenyuw at linux.intel.com; hejie.xu at intel.com; > > bao.yumeng at zte.com.cn; Alex Williamson ; > > eskultet at redhat.com; smooney at redhat.com; intel-gvt- > > dev at lists.freedesktop.org; Jiri Pirko ; > > dinechin at redhat.com; devel at ovirt.org > > Subject: Re: device compatibility interface for live migration with assigned > > devices > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > we actually can also retrieve the same information through sysfs, > > > > .e.g > > > > > > > > |- [path to device] > > > > |--- migration > > > > | |--- self > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > | |--- compatible > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > > > > > > > > > Yes but: > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > - Attribute is coupled with kobject > > > > Is that really that bad? You have the device with an embedded kobject > > anyway, and you can just put things into an attribute group? > > > > [Also, I think that self/compatible split in the example makes things > > needlessly complex. Shouldn't semantic versioning and matching already > > cover nearly everything? I would expect very few cases that are more > > complex than that. Maybe the aggregation stuff, but I don't think we need > > that self/compatible split for that, either.] > > > > > > > > > > All of above seems unnecessary. > > > > > > > > Another point, as we discussed in another thread, it's really hard > > > > to make sure the above API work for all types of devices and > > > > frameworks. So having a vendor specific API looks much better. > > > > > > > > From the POV of userspace mgmt apps doing device compat checking / > > > > migration, we certainly do NOT want to use different vendor > > > > specific APIs. We want to have an API that can be used / controlled in a > > standard manner across vendors. > > > > > > > > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a > > > > long debate on sysfs vs devlink). So if we go with sysfs, at least two > > > > APIs needs to be supported ... > > > > > > NB, I was not questioning devlink vs sysfs directly. If devlink is > > > related to netlink, I can't say I'm enthusiastic as IMKE sysfs is > > > easier to deal with. I don't know enough about devlink to have much of an > > opinion though. > > > The key point was that I don't want the userspace APIs we need to deal > > > with to be vendor specific. > > > > From what I've seen of devlink, it seems quite nice; but I understand why > > sysfs might be easier to deal with (especially as there's likely already a lot of > > code using it.) > > > > I understand that some users would like devlink because it is already widely > > used for network drivers (and some others), but I don't think the majority of > > devices used with vfio are network (although certainly a lot of them are.) > > > > > > > > What I care about is that we have a *standard* userspace API for > > > performing device compatibility checking / state migration, for use by > > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > > vendor specific code paths. > > > > > > If there is vendor specific stuff on the side, that's fine as we can > > > ignore that, but the core functionality for device compat / migration > > > needs to be standardized. > > > > To summarize: > > - choose one of sysfs or devlink > > - have a common interface, with a standardized way to add > > vendor-specific attributes > > ? > > Please refer to my previous email which has more example and details. hi Parav, the example is based on a new vdpa tool running over netlink, not based on devlink, right? For vfio migration compatibility, we have to deal with both mdev and physical pci devices, I don't think it's a good idea to write a new tool for it, given we are able to retrieve the same info from sysfs and there's already an mdevctl from Alex (https://github.com/mdevctl/mdevctl). hi All, could we decide that sysfs is the interface that every VFIO vendor driver needs to provide in order to support vfio live migration, otherwise the userspace management tool would not list the device into the compatible list? if that's true, let's move to the standardizing of the sysfs interface. (1) content common part: (must) - software_version: (in major.minor.bugfix scheme) - device_api: vfio-pci or vfio-ccw ... - type: mdev type for mdev device or a signature for physical device which is a counterpart for mdev type. device api specific part: (must) - pci id: pci id of mdev parent device or pci id of physical pci device (device_api is vfio-pci) - subchannel_type (device_api is vfio-ccw) vendor driver specific part: (optional) - aggregator - chpid_type - remote_url NOTE: vendors are free to add attributes in this part with a restriction that this attribute is able to be configured with the same name in sysfs too. e.g. for aggregator, there must be a sysfs attribute in device node /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, so that the userspace tool is able to configure the target device according to source device's aggregator attribute. (2) where and structure proposal 1: |- [path to device] |--- migration | |--- self | | |-software_version | | |-device_api | | |-type | | |-[pci_id or subchannel_type] | | |- | |--- compatible | | |-software_version | | |-device_api | | |-type | | |-[pci_id or subchannel_type] | | |- multiple compatible is allowed. attributes should be ASCII text files, preferably with only one value per file. proposal 2: use bin_attribute. |- [path to device] |--- migration | |--- self | |--- compatible so we can continue use multiline format. e.g. cat compatible software_version=0.1.0 device_api=vfio_pci type=i915-GVTg_V5_{val1:int:1,2,4,8} pci_id=80865963 aggregator={val1}/2 Thanks Yan From parav at nvidia.com Wed Aug 19 05:26:58 2020 From: parav at nvidia.com (Parav Pandit) Date: Wed, 19 Aug 2020 05:26:58 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <934c8d2a-a34e-6c68-0e53-5de2a8f49d19@redhat.com> References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <934c8d2a-a34e-6c68-0e53-5de2a8f49d19@redhat.com> Message-ID: > From: Jason Wang > Sent: Wednesday, August 19, 2020 8:16 AM > On 2020/8/18 下午5:32, Parav Pandit wrote: > > Hi Jason, > > > > From: Jason Wang > > Sent: Tuesday, August 18, 2020 2:32 PM > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > driver is it handled by? > > It looks that the devlink is for network device specific, and in > > devlink.h, it says include/uapi/linux/devlink.h - Network physical > > device Netlink interface, Actually not, I think there used to have > > some discussion last year and the conclusion is to remove this > > comment. > > > > [...] > > > >> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a long > debate on sysfs vs devlink). So if we go with sysfs, at least two APIs needs to be > supported ... > > We had internal discussion and proposal on this topic. > > I wanted Eli Cohen to be back from vacation on Wed 8/19, but since this is > active discussion right now, I will share the thoughts anyway. > > > > Here are the initial round of thoughts and proposal. > > > > User requirements: > > --------------------------- > > 1. User might want to create one or more vdpa devices per PCI PF/VF/SF. > > 2. User might want to create one or more vdpa devices of type net/blk or > other type. > > 3. User needs to look and dump at the health of the queues for debug purpose. > > 4. During vdpa net device creation time, user may have to provide a MAC > address and/or VLAN. > > 5. User should be able to set/query some of the attributes for > > debug/compatibility check 6. When user wants to create vdpa device, it needs > to know which device supports creation. > > 7. User should be able to see the queue statistics of doorbells, wqes > > etc regardless of class type > > > Note that wqes is probably not something common in all of the vendors. Yes. I virtq descriptors stats is better to monitor the virtqueues. > > > > > > To address above requirements, there is a need of vendor agnostic tool, so > that user can create/config/delete vdpa device(s) regardless of the vendor. > > > > Hence, > > We should have a tool that lets user do it. > > > > Examples: > > ------------- > > (a) List parent devices which supports creating vdpa devices. > > It also shows which class types supported by this parent device. > > In below command two parent devices support vdpa device creation. > > First is PCI VF whose bdf is 03.00:5. > > Second is PCI SF whose name is mlx5_sf.1 > > > > $ vdpa list pd > > > What did "pd" mean? > Parent device which support creation of one or more vdpa devices. In a system there can be multiple parent devices which may be support vdpa creation. User should be able to know which devices support it, and when user creates a vdpa device, it tells which parent device to use for creation as done in below vdpa dev add example. > > > pci/0000:03.00:5 > > class_supports > > net vdpa > > virtbus/mlx5_sf.1 > > > So creating mlx5_sf.1 is the charge of devlink? > Yes. But here vdpa tool is working at the parent device identifier {bus+name} instead of devlink identifier. > > > class_supports > > net > > > > (b) Now add a vdpa device and show the device. > > $ vdpa dev add pci/0000:03.00:5 type net > > > So if you want to create devices types other than vdpa on > pci/0000:03.00:5 it needs some synchronization with devlink? Please refer to FAQ-1, a new tool is not linked to devlink because vdpa will evolve with time and devlink will fall short. So no, it doesn't need any synchronization with devlink. As long as parent device exist, user can create it. All synchronization will be within drivers/vdpa/vdpa.c This user interface is exposed via new netlink family by doing genl_register_family() with new name "vdpa" in drivers/vdpa/vdpa.c. > > > > $ vdpa dev show > > vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues 4 > > > > (c) vdpa dev show features vdpa0 > > iommu platform > > version 1 > > > > (d) dump vdpa statistics > > $ vdpa dev stats show vdpa0 > > kickdoorbells 10 > > wqes 100 > > > > (e) Now delete a vdpa device previously created. > > $ vdpa dev del vdpa0 > > > > Design overview: > > ----------------------- > > 1. Above example tool runs over netlink socket interface. > > 2. This enables users to return meaningful error strings in addition to code so > that user can be more informed. > > Often this is missing in ioctl()/configfs/sysfs interfaces. > > 3. This tool over netlink enables syscaller tests to be more usable like other > subsystems to keep kernel robust > > 4. This provides vendor agnostic view of all vdpa capable parent and vdpa > devices. > > > > 5. Each driver which supports vdpa device creation, registers the parent device > along with supported classes. > > > > FAQs: > > -------- > > 1. Why not using devlink? > > Ans: Because as vdpa echo system grows, devlink will fall short of extending > vdpa specific params, attributes, stats. > > > This should be fine but it's still not clear to me the difference > between a vdpa netlink and a vdpa object in devlink. > The difference is a vdpa specific tool work at the parent device level. It is likely more appropriate to because it can self-contain everything needed to create/delete devices, view/set features, stats. Trying to put that in devlink will fall short as devlink doesn’t have vdpa definitions. Typically when a class/device subsystem grows, its own tool is wiser like iproute2/ip, iproute2/tc, iproute2/rdma. From parav at nvidia.com Wed Aug 19 05:58:12 2020 From: parav at nvidia.com (Parav Pandit) Date: Wed, 19 Aug 2020 05:58:12 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819033035.GA21172@joy-OptiPlex-7040> References: <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> Message-ID: > From: Yan Zhao > Sent: Wednesday, August 19, 2020 9:01 AM > On Tue, Aug 18, 2020 at 09:39:24AM +0000, Parav Pandit wrote: > > Please refer to my previous email which has more example and details. > hi Parav, > the example is based on a new vdpa tool running over netlink, not based on > devlink, right? Right. > For vfio migration compatibility, we have to deal with both mdev and physical > pci devices, I don't think it's a good idea to write a new tool for it, given we are > able to retrieve the same info from sysfs and there's already an mdevctl from mdev attribute should be visible in the mdev's sysfs tree. I do not propose to write a new mdev tool over netlink. I am sorry if I implied that with my suggestion of vdpa tool. If underlying device is vdpa, mdev might be able to understand vdpa device and query from it and populate in mdev sysfs tree. The vdpa tool I propose is usable even without mdevs. vdpa tool's role is to create one or more vdpa devices and place on the "vdpa" bus which is the lowest layer here. Additionally this tool let user query virtqueue stats, db stats. When a user creates vdpa net device, user may need to configure features of the vdpa device such as VIRTIO_NET_F_MAC, default VIRTIO_NET_F_MTU. These are vdpa level features, attributes. Mdev is layer above it. > Alex > (https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. > com%2Fmdevctl%2Fmdevctl&data=02%7C01%7Cparav%40nvidia.com%7C > 0c2691d430304f5ea11308d843f2d84e%7C43083d15727340c1b7db39efd9ccc17 > a%7C0%7C0%7C637334057571911357&sdata=KxH7PwxmKyy9JODut8BWr > LQyOBylW00%2Fyzc4rEvjUvA%3D&reserved=0). > Sorry for above link mangling. Our mail server is still transitioning due to company acquisition. I am less familiar on below points to comment. > hi All, > could we decide that sysfs is the interface that every VFIO vendor driver needs > to provide in order to support vfio live migration, otherwise the userspace > management tool would not list the device into the compatible list? > > if that's true, let's move to the standardizing of the sysfs interface. > (1) content > common part: (must) > - software_version: (in major.minor.bugfix scheme) > - device_api: vfio-pci or vfio-ccw ... > - type: mdev type for mdev device or > a signature for physical device which is a counterpart for > mdev type. > > device api specific part: (must) > - pci id: pci id of mdev parent device or pci id of physical pci > device (device_api is vfio-pci) > - subchannel_type (device_api is vfio-ccw) > > vendor driver specific part: (optional) > - aggregator > - chpid_type > - remote_url > > NOTE: vendors are free to add attributes in this part with a restriction that this > attribute is able to be configured with the same name in sysfs too. e.g. > for aggregator, there must be a sysfs attribute in device node > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180- > 078a62063ab1/intel_vgpu/aggregator, > so that the userspace tool is able to configure the target device according to > source device's aggregator attribute. > > > (2) where and structure > proposal 1: > |- [path to device] > |--- migration > | |--- self > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > | |--- compatible > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > multiple compatible is allowed. > attributes should be ASCII text files, preferably with only one value per file. > > > proposal 2: use bin_attribute. > |- [path to device] > |--- migration > | |--- self > | |--- compatible > > so we can continue use multiline format. e.g. > cat compatible > software_version=0.1.0 > device_api=vfio_pci > type=i915-GVTg_V5_{val1:int:1,2,4,8} > pci_id=80865963 > aggregator={val1}/2 > > Thanks > Yan From jasowang at redhat.com Wed Aug 19 06:48:34 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 14:48:34 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <934c8d2a-a34e-6c68-0e53-5de2a8f49d19@redhat.com> Message-ID: <115147a9-3d8c-aa95-c43d-251a321ac152@redhat.com> On 2020/8/19 下午1:26, Parav Pandit wrote: > >> From: Jason Wang >> Sent: Wednesday, August 19, 2020 8:16 AM > >> On 2020/8/18 下午5:32, Parav Pandit wrote: >>> Hi Jason, >>> >>> From: Jason Wang >>> Sent: Tuesday, August 18, 2020 2:32 PM >>> >>> >>> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: >>> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: >>> On 2020/8/14 下午1:16, Yan Zhao wrote: >>> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: >>> On 2020/8/10 下午3:46, Yan Zhao wrote: >>> driver is it handled by? >>> It looks that the devlink is for network device specific, and in >>> devlink.h, it says include/uapi/linux/devlink.h - Network physical >>> device Netlink interface, Actually not, I think there used to have >>> some discussion last year and the conclusion is to remove this >>> comment. >>> >>> [...] >>> >>>> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a long >> debate on sysfs vs devlink). So if we go with sysfs, at least two APIs needs to be >> supported ... >>> We had internal discussion and proposal on this topic. >>> I wanted Eli Cohen to be back from vacation on Wed 8/19, but since this is >> active discussion right now, I will share the thoughts anyway. >>> Here are the initial round of thoughts and proposal. >>> >>> User requirements: >>> --------------------------- >>> 1. User might want to create one or more vdpa devices per PCI PF/VF/SF. >>> 2. User might want to create one or more vdpa devices of type net/blk or >> other type. >>> 3. User needs to look and dump at the health of the queues for debug purpose. >>> 4. During vdpa net device creation time, user may have to provide a MAC >> address and/or VLAN. >>> 5. User should be able to set/query some of the attributes for >>> debug/compatibility check 6. When user wants to create vdpa device, it needs >> to know which device supports creation. >>> 7. User should be able to see the queue statistics of doorbells, wqes >>> etc regardless of class type >> >> Note that wqes is probably not something common in all of the vendors. > Yes. I virtq descriptors stats is better to monitor the virtqueues. > >> >>> To address above requirements, there is a need of vendor agnostic tool, so >> that user can create/config/delete vdpa device(s) regardless of the vendor. >>> Hence, >>> We should have a tool that lets user do it. >>> >>> Examples: >>> ------------- >>> (a) List parent devices which supports creating vdpa devices. >>> It also shows which class types supported by this parent device. >>> In below command two parent devices support vdpa device creation. >>> First is PCI VF whose bdf is 03.00:5. >>> Second is PCI SF whose name is mlx5_sf.1 >>> >>> $ vdpa list pd >> >> What did "pd" mean? >> > Parent device which support creation of one or more vdpa devices. > In a system there can be multiple parent devices which may be support vdpa creation. > User should be able to know which devices support it, and when user creates a vdpa device, it tells which parent device to use for creation as done in below vdpa dev add example. >>> pci/0000:03.00:5 >>> class_supports >>> net vdpa >>> virtbus/mlx5_sf.1 >> >> So creating mlx5_sf.1 is the charge of devlink? >> > Yes. > But here vdpa tool is working at the parent device identifier {bus+name} instead of devlink identifier. > > >>> class_supports >>> net >>> >>> (b) Now add a vdpa device and show the device. >>> $ vdpa dev add pci/0000:03.00:5 type net >> >> So if you want to create devices types other than vdpa on >> pci/0000:03.00:5 it needs some synchronization with devlink? > Please refer to FAQ-1, a new tool is not linked to devlink because vdpa will evolve with time and devlink will fall short. > So no, it doesn't need any synchronization with devlink. > As long as parent device exist, user can create it. > All synchronization will be within drivers/vdpa/vdpa.c > This user interface is exposed via new netlink family by doing genl_register_family() with new name "vdpa" in drivers/vdpa/vdpa.c. Just to make sure I understand here. Consider we had virtbus/mlx5_sf.1. Process A want to create a vDPA instance on top of it but Process B want to create a IB instance. Then I think some synchronization is needed at at least parent device level? > >> >>> $ vdpa dev show >>> vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues 4 >>> >>> (c) vdpa dev show features vdpa0 >>> iommu platform >>> version 1 >>> >>> (d) dump vdpa statistics >>> $ vdpa dev stats show vdpa0 >>> kickdoorbells 10 >>> wqes 100 >>> >>> (e) Now delete a vdpa device previously created. >>> $ vdpa dev del vdpa0 >>> >>> Design overview: >>> ----------------------- >>> 1. Above example tool runs over netlink socket interface. >>> 2. This enables users to return meaningful error strings in addition to code so >> that user can be more informed. >>> Often this is missing in ioctl()/configfs/sysfs interfaces. >>> 3. This tool over netlink enables syscaller tests to be more usable like other >> subsystems to keep kernel robust >>> 4. This provides vendor agnostic view of all vdpa capable parent and vdpa >> devices. >>> 5. Each driver which supports vdpa device creation, registers the parent device >> along with supported classes. >>> FAQs: >>> -------- >>> 1. Why not using devlink? >>> Ans: Because as vdpa echo system grows, devlink will fall short of extending >> vdpa specific params, attributes, stats. >> >> >> This should be fine but it's still not clear to me the difference >> between a vdpa netlink and a vdpa object in devlink. >> > The difference is a vdpa specific tool work at the parent device level. > It is likely more appropriate to because it can self-contain everything needed to create/delete devices, view/set features, stats. > Trying to put that in devlink will fall short as devlink doesn’t have vdpa definitions. > Typically when a class/device subsystem grows, its own tool is wiser like iproute2/ip, iproute2/tc, iproute2/rdma. Ok, I see. Thanks From parav at nvidia.com Wed Aug 19 06:53:03 2020 From: parav at nvidia.com (Parav Pandit) Date: Wed, 19 Aug 2020 06:53:03 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <115147a9-3d8c-aa95-c43d-251a321ac152@redhat.com> References: <20200805021654.GB30485@joy-OptiPlex-7040> <2624b12f-3788-7e2b-2cb7-93534960bcb7@redhat.com> <20200805075647.GB2177@nanopsycho> <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <934c8d2a-a34e-6c68-0e53-5de2a8f49d19@redhat.com> <115147a9-3d8c-aa95-c43d-251a321ac152@redhat.com> Message-ID: > From: Jason Wang > Sent: Wednesday, August 19, 2020 12:19 PM > > > On 2020/8/19 下午1:26, Parav Pandit wrote: > > > >> From: Jason Wang > >> Sent: Wednesday, August 19, 2020 8:16 AM > > > >> On 2020/8/18 下午5:32, Parav Pandit wrote: > >>> Hi Jason, > >>> > >>> From: Jason Wang > >>> Sent: Tuesday, August 18, 2020 2:32 PM > >>> > >>> > >>> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > >>> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > >>> On 2020/8/14 下午1:16, Yan Zhao wrote: > >>> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > >>> On 2020/8/10 下午3:46, Yan Zhao wrote: > >>> driver is it handled by? > >>> It looks that the devlink is for network device specific, and in > >>> devlink.h, it says include/uapi/linux/devlink.h - Network physical > >>> device Netlink interface, Actually not, I think there used to have > >>> some discussion last year and the conclusion is to remove this > >>> comment. > >>> > >>> [...] > >>> > >>>> Yes, but it could be hard. E.g vDPA will chose to use devlink > >>>> (there's a long > >> debate on sysfs vs devlink). So if we go with sysfs, at least two > >> APIs needs to be supported ... > >>> We had internal discussion and proposal on this topic. > >>> I wanted Eli Cohen to be back from vacation on Wed 8/19, but since > >>> this is > >> active discussion right now, I will share the thoughts anyway. > >>> Here are the initial round of thoughts and proposal. > >>> > >>> User requirements: > >>> --------------------------- > >>> 1. User might want to create one or more vdpa devices per PCI PF/VF/SF. > >>> 2. User might want to create one or more vdpa devices of type > >>> net/blk or > >> other type. > >>> 3. User needs to look and dump at the health of the queues for debug > purpose. > >>> 4. During vdpa net device creation time, user may have to provide a > >>> MAC > >> address and/or VLAN. > >>> 5. User should be able to set/query some of the attributes for > >>> debug/compatibility check 6. When user wants to create vdpa device, > >>> it needs > >> to know which device supports creation. > >>> 7. User should be able to see the queue statistics of doorbells, > >>> wqes etc regardless of class type > >> > >> Note that wqes is probably not something common in all of the vendors. > > Yes. I virtq descriptors stats is better to monitor the virtqueues. > > > >> > >>> To address above requirements, there is a need of vendor agnostic > >>> tool, so > >> that user can create/config/delete vdpa device(s) regardless of the vendor. > >>> Hence, > >>> We should have a tool that lets user do it. > >>> > >>> Examples: > >>> ------------- > >>> (a) List parent devices which supports creating vdpa devices. > >>> It also shows which class types supported by this parent device. > >>> In below command two parent devices support vdpa device creation. > >>> First is PCI VF whose bdf is 03.00:5. > >>> Second is PCI SF whose name is mlx5_sf.1 > >>> > >>> $ vdpa list pd > >> > >> What did "pd" mean? > >> > > Parent device which support creation of one or more vdpa devices. > > In a system there can be multiple parent devices which may be support vdpa > creation. > > User should be able to know which devices support it, and when user creates a > vdpa device, it tells which parent device to use for creation as done in below > vdpa dev add example. > >>> pci/0000:03.00:5 > >>> class_supports > >>> net vdpa > >>> virtbus/mlx5_sf.1 > >> > >> So creating mlx5_sf.1 is the charge of devlink? > >> > > Yes. > > But here vdpa tool is working at the parent device identifier {bus+name} > instead of devlink identifier. > > > > > >>> class_supports > >>> net > >>> > >>> (b) Now add a vdpa device and show the device. > >>> $ vdpa dev add pci/0000:03.00:5 type net > >> > >> So if you want to create devices types other than vdpa on > >> pci/0000:03.00:5 it needs some synchronization with devlink? > > Please refer to FAQ-1, a new tool is not linked to devlink because vdpa will > evolve with time and devlink will fall short. > > So no, it doesn't need any synchronization with devlink. > > As long as parent device exist, user can create it. > > All synchronization will be within drivers/vdpa/vdpa.c This user > > interface is exposed via new netlink family by doing genl_register_family() with > new name "vdpa" in drivers/vdpa/vdpa.c. > > > Just to make sure I understand here. > > Consider we had virtbus/mlx5_sf.1. Process A want to create a vDPA instance on > top of it but Process B want to create a IB instance. Then I think some > synchronization is needed at at least parent device level? Likely but rdma device will be created either through $ rdma link add command. Or auto created by driver because there is only one without much configuration. While vdpa device(s) for virtbus/mlx5_sf.1 will be created through vdpa subsystem. And vdpa's synchronization will be contained within drivers/vdpa/vdpa.c > > > > > >> > >>> $ vdpa dev show > >>> vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues > >>> 4 > >>> > >>> (c) vdpa dev show features vdpa0 > >>> iommu platform > >>> version 1 > >>> > >>> (d) dump vdpa statistics > >>> $ vdpa dev stats show vdpa0 > >>> kickdoorbells 10 > >>> wqes 100 > >>> > >>> (e) Now delete a vdpa device previously created. > >>> $ vdpa dev del vdpa0 > >>> > >>> Design overview: > >>> ----------------------- > >>> 1. Above example tool runs over netlink socket interface. > >>> 2. This enables users to return meaningful error strings in addition > >>> to code so > >> that user can be more informed. > >>> Often this is missing in ioctl()/configfs/sysfs interfaces. > >>> 3. This tool over netlink enables syscaller tests to be more usable > >>> like other > >> subsystems to keep kernel robust > >>> 4. This provides vendor agnostic view of all vdpa capable parent and > >>> vdpa > >> devices. > >>> 5. Each driver which supports vdpa device creation, registers the > >>> parent device > >> along with supported classes. > >>> FAQs: > >>> -------- > >>> 1. Why not using devlink? > >>> Ans: Because as vdpa echo system grows, devlink will fall short of > >>> extending > >> vdpa specific params, attributes, stats. > >> > >> > >> This should be fine but it's still not clear to me the difference > >> between a vdpa netlink and a vdpa object in devlink. > >> > > The difference is a vdpa specific tool work at the parent device level. > > It is likely more appropriate to because it can self-contain everything needed > to create/delete devices, view/set features, stats. > > Trying to put that in devlink will fall short as devlink doesn’t have vdpa > definitions. > > Typically when a class/device subsystem grows, its own tool is wiser like > iproute2/ip, iproute2/tc, iproute2/rdma. > > > Ok, I see. > > Thanks > From jasowang at redhat.com Wed Aug 19 06:57:34 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 14:57:34 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819033035.GA21172@joy-OptiPlex-7040> References: <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> Message-ID: On 2020/8/19 上午11:30, Yan Zhao wrote: > hi All, > could we decide that sysfs is the interface that every VFIO vendor driver > needs to provide in order to support vfio live migration, otherwise the > userspace management tool would not list the device into the compatible > list? > > if that's true, let's move to the standardizing of the sysfs interface. > (1) content > common part: (must) > - software_version: (in major.minor.bugfix scheme) This can not work for devices whose features can be negotiated/advertised independently. (E.g virtio devices) > - device_api: vfio-pci or vfio-ccw ... > - type: mdev type for mdev device or > a signature for physical device which is a counterpart for > mdev type. > > device api specific part: (must) > - pci id: pci id of mdev parent device or pci id of physical pci > device (device_api is vfio-pci)API here. So this assumes a PCI device which is probably not true. > - subchannel_type (device_api is vfio-ccw) > > vendor driver specific part: (optional) > - aggregator > - chpid_type > - remote_url For "remote_url", just wonder if it's better to integrate or reuse the existing NVME management interface instead of duplicating it here. Otherwise it could be a burden for mgmt to learn. E.g vendor A may use "remote_url" but vendor B may use a different attribute. > > NOTE: vendors are free to add attributes in this part with a > restriction that this attribute is able to be configured with the same > name in sysfs too. e.g. Sysfs works well for common attributes belongs to a class, but I'm not sure it can work well for device/vendor specific attributes. Does this mean mgmt need to iterate all the attributes in both src and dst? > for aggregator, there must be a sysfs attribute in device node > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > so that the userspace tool is able to configure the target device > according to source device's aggregator attribute. > > > (2) where and structure > proposal 1: > |- [path to device] > |--- migration > | |--- self > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > | |--- compatible > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > multiple compatible is allowed. > attributes should be ASCII text files, preferably with only one value > per file. > > > proposal 2: use bin_attribute. > |- [path to device] > |--- migration > | |--- self > | |--- compatible > > so we can continue use multiline format. e.g. > cat compatible > software_version=0.1.0 > device_api=vfio_pci > type=i915-GVTg_V5_{val1:int:1,2,4,8} > pci_id=80865963 > aggregator={val1}/2 So basically two questions: - how hard to standardize sysfs API for dealing with compatibility check (to make it work for most types of devices) - how hard for the mgmt to learn with a vendor specific attributes (vs existing management API) Thanks > > Thanks > Yan From yan.y.zhao at intel.com Wed Aug 19 06:59:51 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 19 Aug 2020 14:59:51 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> Message-ID: <20200819065951.GB21172@joy-OptiPlex-7040> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: > > On 2020/8/19 上午11:30, Yan Zhao wrote: > > hi All, > > could we decide that sysfs is the interface that every VFIO vendor driver > > needs to provide in order to support vfio live migration, otherwise the > > userspace management tool would not list the device into the compatible > > list? > > > > if that's true, let's move to the standardizing of the sysfs interface. > > (1) content > > common part: (must) > > - software_version: (in major.minor.bugfix scheme) > > > This can not work for devices whose features can be negotiated/advertised > independently. (E.g virtio devices) > sorry, I don't understand here, why virtio devices need to use vfio interface? I think this thread is discussing about vfio related devices. > > > - device_api: vfio-pci or vfio-ccw ... > > - type: mdev type for mdev device or > > a signature for physical device which is a counterpart for > > mdev type. > > > > device api specific part: (must) > > - pci id: pci id of mdev parent device or pci id of physical pci > > device (device_api is vfio-pci)API here. > > > So this assumes a PCI device which is probably not true. > for device_api of vfio-pci, why it's not true? for vfio-ccw, it's subchannel_type. > > > - subchannel_type (device_api is vfio-ccw) > > vendor driver specific part: (optional) > > - aggregator > > - chpid_type > > - remote_url > > > For "remote_url", just wonder if it's better to integrate or reuse the > existing NVME management interface instead of duplicating it here. Otherwise > it could be a burden for mgmt to learn. E.g vendor A may use "remote_url" > but vendor B may use a different attribute. > it's vendor driver specific. vendor specific attributes are inevitable, and that's why we are discussing here of a way to standardizing of it. our goal is that mgmt can use it without understanding the meaning of vendor specific attributes. > > > > > NOTE: vendors are free to add attributes in this part with a > > restriction that this attribute is able to be configured with the same > > name in sysfs too. e.g. > > > Sysfs works well for common attributes belongs to a class, but I'm not sure > it can work well for device/vendor specific attributes. Does this mean mgmt > need to iterate all the attributes in both src and dst? > no. just attributes under migration directory. > > > for aggregator, there must be a sysfs attribute in device node > > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > > so that the userspace tool is able to configure the target device > > according to source device's aggregator attribute. > > > > > > (2) where and structure > > proposal 1: > > |- [path to device] > > |--- migration > > | |--- self > > | | |-software_version > > | | |-device_api > > | | |-type > > | | |-[pci_id or subchannel_type] > > | | |- > > | |--- compatible > > | | |-software_version > > | | |-device_api > > | | |-type > > | | |-[pci_id or subchannel_type] > > | | |- > > multiple compatible is allowed. > > attributes should be ASCII text files, preferably with only one value > > per file. > > > > > > proposal 2: use bin_attribute. > > |- [path to device] > > |--- migration > > | |--- self > > | |--- compatible > > > > so we can continue use multiline format. e.g. > > cat compatible > > software_version=0.1.0 > > device_api=vfio_pci > > type=i915-GVTg_V5_{val1:int:1,2,4,8} > > pci_id=80865963 > > aggregator={val1}/2 > > > So basically two questions: > > - how hard to standardize sysfs API for dealing with compatibility check (to > make it work for most types of devices) sorry, I just know we are in the process of standardizing of it :) > - how hard for the mgmt to learn with a vendor specific attributes (vs > existing management API) what is existing management API? Thanks From jasowang at redhat.com Wed Aug 19 07:39:50 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 15:39:50 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819065951.GB21172@joy-OptiPlex-7040> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> Message-ID: On 2020/8/19 下午2:59, Yan Zhao wrote: > On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: >> On 2020/8/19 上午11:30, Yan Zhao wrote: >>> hi All, >>> could we decide that sysfs is the interface that every VFIO vendor driver >>> needs to provide in order to support vfio live migration, otherwise the >>> userspace management tool would not list the device into the compatible >>> list? >>> >>> if that's true, let's move to the standardizing of the sysfs interface. >>> (1) content >>> common part: (must) >>> - software_version: (in major.minor.bugfix scheme) >> >> This can not work for devices whose features can be negotiated/advertised >> independently. (E.g virtio devices) >> > sorry, I don't understand here, why virtio devices need to use vfio interface? I don't see any reason that virtio devices can't be used by VFIO. Do you? Actually, virtio devices have been used by VFIO for many years: - passthrough a hardware virtio devices to userspace(VM) drivers - using virtio PMD inside guest > I think this thread is discussing about vfio related devices. > >>> - device_api: vfio-pci or vfio-ccw ... >>> - type: mdev type for mdev device or >>> a signature for physical device which is a counterpart for >>> mdev type. >>> >>> device api specific part: (must) >>> - pci id: pci id of mdev parent device or pci id of physical pci >>> device (device_api is vfio-pci)API here. >> >> So this assumes a PCI device which is probably not true. >> > for device_api of vfio-pci, why it's not true? > > for vfio-ccw, it's subchannel_type. Ok but having two different attributes for the same file is not good idea. How mgmt know there will be a 3rd type? > >>> - subchannel_type (device_api is vfio-ccw) >>> vendor driver specific part: (optional) >>> - aggregator >>> - chpid_type >>> - remote_url >> >> For "remote_url", just wonder if it's better to integrate or reuse the >> existing NVME management interface instead of duplicating it here. Otherwise >> it could be a burden for mgmt to learn. E.g vendor A may use "remote_url" >> but vendor B may use a different attribute. >> > it's vendor driver specific. > vendor specific attributes are inevitable, and that's why we are > discussing here of a way to standardizing of it. Well, then you will end up with a very long list to discuss. E.g for networking devices, you will have "mac", "v(x)lan" and a lot of other. Note that "remote_url" is not vendor specific but NVME (class/subsystem) specific. The point is that if vendor/class specific part is unavoidable, why not making all of the attributes vendor specific? > our goal is that mgmt can use it without understanding the meaning of vendor > specific attributes. I'm not sure this is the correct design of uAPI. Is there something similar in the existing uAPIs? And it might be hard to work for virtio devices. > >>> NOTE: vendors are free to add attributes in this part with a >>> restriction that this attribute is able to be configured with the same >>> name in sysfs too. e.g. >> >> Sysfs works well for common attributes belongs to a class, but I'm not sure >> it can work well for device/vendor specific attributes. Does this mean mgmt >> need to iterate all the attributes in both src and dst? >> > no. just attributes under migration directory. > >>> for aggregator, there must be a sysfs attribute in device node >>> /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, >>> so that the userspace tool is able to configure the target device >>> according to source device's aggregator attribute. >>> >>> >>> (2) where and structure >>> proposal 1: >>> |- [path to device] >>> |--- migration >>> | |--- self >>> | | |-software_version >>> | | |-device_api >>> | | |-type >>> | | |-[pci_id or subchannel_type] >>> | | |- >>> | |--- compatible >>> | | |-software_version >>> | | |-device_api >>> | | |-type >>> | | |-[pci_id or subchannel_type] >>> | | |- >>> multiple compatible is allowed. >>> attributes should be ASCII text files, preferably with only one value >>> per file. >>> >>> >>> proposal 2: use bin_attribute. >>> |- [path to device] >>> |--- migration >>> | |--- self >>> | |--- compatible >>> >>> so we can continue use multiline format. e.g. >>> cat compatible >>> software_version=0.1.0 >>> device_api=vfio_pci >>> type=i915-GVTg_V5_{val1:int:1,2,4,8} >>> pci_id=80865963 >>> aggregator={val1}/2 >> >> So basically two questions: >> >> - how hard to standardize sysfs API for dealing with compatibility check (to >> make it work for most types of devices) > sorry, I just know we are in the process of standardizing of it :) It's not easy. As I said, the current design can't work for virtio devices and it's not hard to find other examples. I remember some Intel devices have bitmask based capability registers. > >> - how hard for the mgmt to learn with a vendor specific attributes (vs >> existing management API) > what is existing management API? It depends on the type of devices. E.g for NVME, we've already had one (/sys/kernel/config/nvme)? Thanks > > Thanks > From yan.y.zhao at intel.com Wed Aug 19 08:13:39 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 19 Aug 2020 16:13:39 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> Message-ID: <20200819081338.GC21172@joy-OptiPlex-7040> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: > > On 2020/8/19 下午2:59, Yan Zhao wrote: > > On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: > > > On 2020/8/19 上午11:30, Yan Zhao wrote: > > > > hi All, > > > > could we decide that sysfs is the interface that every VFIO vendor driver > > > > needs to provide in order to support vfio live migration, otherwise the > > > > userspace management tool would not list the device into the compatible > > > > list? > > > > > > > > if that's true, let's move to the standardizing of the sysfs interface. > > > > (1) content > > > > common part: (must) > > > > - software_version: (in major.minor.bugfix scheme) > > > > > > This can not work for devices whose features can be negotiated/advertised > > > independently. (E.g virtio devices) > > > > > sorry, I don't understand here, why virtio devices need to use vfio interface? > > > I don't see any reason that virtio devices can't be used by VFIO. Do you? > > Actually, virtio devices have been used by VFIO for many years: > > - passthrough a hardware virtio devices to userspace(VM) drivers > - using virtio PMD inside guest > So, what's different for it vs passing through a physical hardware via VFIO? even though the features are negotiated dynamically, could you explain why it would cause software_version not work? > > > I think this thread is discussing about vfio related devices. > > > > > > - device_api: vfio-pci or vfio-ccw ... > > > > - type: mdev type for mdev device or > > > > a signature for physical device which is a counterpart for > > > > mdev type. > > > > > > > > device api specific part: (must) > > > > - pci id: pci id of mdev parent device or pci id of physical pci > > > > device (device_api is vfio-pci)API here. > > > > > > So this assumes a PCI device which is probably not true. > > > > > for device_api of vfio-pci, why it's not true? > > > > for vfio-ccw, it's subchannel_type. > > > Ok but having two different attributes for the same file is not good idea. > How mgmt know there will be a 3rd type? that's why some attributes need to be common. e.g. device_api: it's common because mgmt need to know it's a pci device or a ccw device. and the api type is already defined vfio.h. (The field is agreed by and actually suggested by Alex in previous mail) type: mdev_type for mdev. if mgmt does not understand it, it would not be able to create one compatible mdev device. software_version: mgmt can compare the major and minor if it understands this fields. > > > > > > > > - subchannel_type (device_api is vfio-ccw) > > > > vendor driver specific part: (optional) > > > > - aggregator > > > > - chpid_type > > > > - remote_url > > > > > > For "remote_url", just wonder if it's better to integrate or reuse the > > > existing NVME management interface instead of duplicating it here. Otherwise > > > it could be a burden for mgmt to learn. E.g vendor A may use "remote_url" > > > but vendor B may use a different attribute. > > > > > it's vendor driver specific. > > vendor specific attributes are inevitable, and that's why we are > > discussing here of a way to standardizing of it. > > > Well, then you will end up with a very long list to discuss. E.g for > networking devices, you will have "mac", "v(x)lan" and a lot of other. > > Note that "remote_url" is not vendor specific but NVME (class/subsystem) > specific. > yes, it's just NVMe specific. I added it as an example to show what is vendor specific. if one attribute is vendor specific across all vendors, then it's not vendor specific, it's already common attribute, right? > The point is that if vendor/class specific part is unavoidable, why not > making all of the attributes vendor specific? > some parts need to be common, as I listed above. > > > our goal is that mgmt can use it without understanding the meaning of vendor > > specific attributes. > > > I'm not sure this is the correct design of uAPI. Is there something similar > in the existing uAPIs? > > And it might be hard to work for virtio devices. > > > > > > > > NOTE: vendors are free to add attributes in this part with a > > > > restriction that this attribute is able to be configured with the same > > > > name in sysfs too. e.g. > > > > > > Sysfs works well for common attributes belongs to a class, but I'm not sure > > > it can work well for device/vendor specific attributes. Does this mean mgmt > > > need to iterate all the attributes in both src and dst? > > > > > no. just attributes under migration directory. > > > > > > for aggregator, there must be a sysfs attribute in device node > > > > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > > > > so that the userspace tool is able to configure the target device > > > > according to source device's aggregator attribute. > > > > > > > > > > > > (2) where and structure > > > > proposal 1: > > > > |- [path to device] > > > > |--- migration > > > > | |--- self > > > > | | |-software_version > > > > | | |-device_api > > > > | | |-type > > > > | | |-[pci_id or subchannel_type] > > > > | | |- > > > > | |--- compatible > > > > | | |-software_version > > > > | | |-device_api > > > > | | |-type > > > > | | |-[pci_id or subchannel_type] > > > > | | |- > > > > multiple compatible is allowed. > > > > attributes should be ASCII text files, preferably with only one value > > > > per file. > > > > > > > > > > > > proposal 2: use bin_attribute. > > > > |- [path to device] > > > > |--- migration > > > > | |--- self > > > > | |--- compatible > > > > > > > > so we can continue use multiline format. e.g. > > > > cat compatible > > > > software_version=0.1.0 > > > > device_api=vfio_pci > > > > type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > pci_id=80865963 > > > > aggregator={val1}/2 > > > > > > So basically two questions: > > > > > > - how hard to standardize sysfs API for dealing with compatibility check (to > > > make it work for most types of devices) > > sorry, I just know we are in the process of standardizing of it :) > > > It's not easy. As I said, the current design can't work for virtio devices > and it's not hard to find other examples. I remember some Intel devices have > bitmask based capability registers. > some Intel devices have bitmask based capability registers. so what? we have defined pci_id to identify the devices. even two different devices have equal PCI IDs, we still allow them to add vendor specific fields. e.g. for QAT, they can add alg_set to identify hardware supported algorithms. > > > > > > - how hard for the mgmt to learn with a vendor specific attributes (vs > > > existing management API) > > what is existing management API? > > > It depends on the type of devices. E.g for NVME, we've already had one > (/sys/kernel/config/nvme)? > if the device is binding to vfio or vfio-mdev, I believe this interface is not there. Thanks Yan From jasowang at redhat.com Wed Aug 19 09:28:38 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 17:28:38 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819081338.GC21172@joy-OptiPlex-7040> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> <20200819081338.GC21172@joy-OptiPlex-7040> Message-ID: On 2020/8/19 下午4:13, Yan Zhao wrote: > On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: >> On 2020/8/19 下午2:59, Yan Zhao wrote: >>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: >>>> On 2020/8/19 上午11:30, Yan Zhao wrote: >>>>> hi All, >>>>> could we decide that sysfs is the interface that every VFIO vendor driver >>>>> needs to provide in order to support vfio live migration, otherwise the >>>>> userspace management tool would not list the device into the compatible >>>>> list? >>>>> >>>>> if that's true, let's move to the standardizing of the sysfs interface. >>>>> (1) content >>>>> common part: (must) >>>>> - software_version: (in major.minor.bugfix scheme) >>>> This can not work for devices whose features can be negotiated/advertised >>>> independently. (E.g virtio devices) >>>> >>> sorry, I don't understand here, why virtio devices need to use vfio interface? >> >> I don't see any reason that virtio devices can't be used by VFIO. Do you? >> >> Actually, virtio devices have been used by VFIO for many years: >> >> - passthrough a hardware virtio devices to userspace(VM) drivers >> - using virtio PMD inside guest >> > So, what's different for it vs passing through a physical hardware via VFIO? The difference is in the guest, the device could be either real hardware or emulated ones. > even though the features are negotiated dynamically, could you explain > why it would cause software_version not work? Virtio device 1 supports feature A, B, C Virtio device 2 supports feature B, C, D So you can't migrate a guest from device 1 to device 2. And it's impossible to model the features with versions. > > >>> I think this thread is discussing about vfio related devices. >>> >>>>> - device_api: vfio-pci or vfio-ccw ... >>>>> - type: mdev type for mdev device or >>>>> a signature for physical device which is a counterpart for >>>>> mdev type. >>>>> >>>>> device api specific part: (must) >>>>> - pci id: pci id of mdev parent device or pci id of physical pci >>>>> device (device_api is vfio-pci)API here. >>>> So this assumes a PCI device which is probably not true. >>>> >>> for device_api of vfio-pci, why it's not true? >>> >>> for vfio-ccw, it's subchannel_type. >> >> Ok but having two different attributes for the same file is not good idea. >> How mgmt know there will be a 3rd type? > that's why some attributes need to be common. e.g. > device_api: it's common because mgmt need to know it's a pci device or a > ccw device. and the api type is already defined vfio.h. > (The field is agreed by and actually suggested by Alex in previous mail) > type: mdev_type for mdev. if mgmt does not understand it, it would not > be able to create one compatible mdev device. > software_version: mgmt can compare the major and minor if it understands > this fields. I think it would be helpful if you can describe how mgmt is expected to work step by step with the proposed sysfs API. This can help people to understand. Thanks for the patience. Since sysfs is uABI, when accepted, we need support it forever. That's why we need to be careful. >> >>>>> - subchannel_type (device_api is vfio-ccw) >>>>> vendor driver specific part: (optional) >>>>> - aggregator >>>>> - chpid_type >>>>> - remote_url >>>> For "remote_url", just wonder if it's better to integrate or reuse the >>>> existing NVME management interface instead of duplicating it here. Otherwise >>>> it could be a burden for mgmt to learn. E.g vendor A may use "remote_url" >>>> but vendor B may use a different attribute. >>>> >>> it's vendor driver specific. >>> vendor specific attributes are inevitable, and that's why we are >>> discussing here of a way to standardizing of it. >> >> Well, then you will end up with a very long list to discuss. E.g for >> networking devices, you will have "mac", "v(x)lan" and a lot of other. >> >> Note that "remote_url" is not vendor specific but NVME (class/subsystem) >> specific. >> > yes, it's just NVMe specific. I added it as an example to show what is > vendor specific. > if one attribute is vendor specific across all vendors, then it's not vendor specific, > it's already common attribute, right? It's common but the issue is about naming and mgmt overhead. Unless you have a unified API per class (NVME, ethernet, etc), you can't prevent vendor from using another name instead of "remote_url". > >> The point is that if vendor/class specific part is unavoidable, why not >> making all of the attributes vendor specific? >> > some parts need to be common, as I listed above. This is hard, unless VFIO knows the type of device (e.g it's a NVME or networking device). > >>> our goal is that mgmt can use it without understanding the meaning of vendor >>> specific attributes. >> >> I'm not sure this is the correct design of uAPI. Is there something similar >> in the existing uAPIs? >> >> And it might be hard to work for virtio devices. >> >> >>>>> NOTE: vendors are free to add attributes in this part with a >>>>> restriction that this attribute is able to be configured with the same >>>>> name in sysfs too. e.g. >>>> Sysfs works well for common attributes belongs to a class, but I'm not sure >>>> it can work well for device/vendor specific attributes. Does this mean mgmt >>>> need to iterate all the attributes in both src and dst? >>>> >>> no. just attributes under migration directory. >>> >>>>> for aggregator, there must be a sysfs attribute in device node >>>>> /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, >>>>> so that the userspace tool is able to configure the target device >>>>> according to source device's aggregator attribute. >>>>> >>>>> >>>>> (2) where and structure >>>>> proposal 1: >>>>> |- [path to device] >>>>> |--- migration >>>>> | |--- self >>>>> | | |-software_version >>>>> | | |-device_api >>>>> | | |-type >>>>> | | |-[pci_id or subchannel_type] >>>>> | | |- >>>>> | |--- compatible >>>>> | | |-software_version >>>>> | | |-device_api >>>>> | | |-type >>>>> | | |-[pci_id or subchannel_type] >>>>> | | |- >>>>> multiple compatible is allowed. >>>>> attributes should be ASCII text files, preferably with only one value >>>>> per file. >>>>> >>>>> >>>>> proposal 2: use bin_attribute. >>>>> |- [path to device] >>>>> |--- migration >>>>> | |--- self >>>>> | |--- compatible >>>>> >>>>> so we can continue use multiline format. e.g. >>>>> cat compatible >>>>> software_version=0.1.0 >>>>> device_api=vfio_pci >>>>> type=i915-GVTg_V5_{val1:int:1,2,4,8} >>>>> pci_id=80865963 >>>>> aggregator={val1}/2 >>>> So basically two questions: >>>> >>>> - how hard to standardize sysfs API for dealing with compatibility check (to >>>> make it work for most types of devices) >>> sorry, I just know we are in the process of standardizing of it :) >> >> It's not easy. As I said, the current design can't work for virtio devices >> and it's not hard to find other examples. I remember some Intel devices have >> bitmask based capability registers. >> > some Intel devices have bitmask based capability registers. > so what? You should at least make the proposed API working for your(Intel) own devices. > we have defined pci_id to identify the devices. > even two different devices have equal PCI IDs, we still allow them to > add vendor specific fields. e.g. > for QAT, they can add alg_set to identify hardware supported algorithms. Well, the point is to make sure the API not work only for some specific devices. If we agree with this, we need try to seek what is missed instead. > >>>> - how hard for the mgmt to learn with a vendor specific attributes (vs >>>> existing management API) >>> what is existing management API? >> >> It depends on the type of devices. E.g for NVME, we've already had one >> (/sys/kernel/config/nvme)? >> > if the device is binding to vfio or vfio-mdev, I believe this interface > is not there. So you want to duplicate some APIs with existing NVME ones? Thanks > > > Thanks > Yan > From jasowang at redhat.com Wed Aug 19 09:41:39 2020 From: jasowang at redhat.com (Jason Wang) Date: Wed, 19 Aug 2020 17:41:39 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> Message-ID: On 2020/8/19 下午1:58, Parav Pandit wrote: > >> From: Yan Zhao >> Sent: Wednesday, August 19, 2020 9:01 AM >> On Tue, Aug 18, 2020 at 09:39:24AM +0000, Parav Pandit wrote: >>> Please refer to my previous email which has more example and details. >> hi Parav, >> the example is based on a new vdpa tool running over netlink, not based on >> devlink, right? > Right. > >> For vfio migration compatibility, we have to deal with both mdev and physical >> pci devices, I don't think it's a good idea to write a new tool for it, given we are >> able to retrieve the same info from sysfs and there's already an mdevctl from > mdev attribute should be visible in the mdev's sysfs tree. > I do not propose to write a new mdev tool over netlink. I am sorry if I implied that with my suggestion of vdpa tool. > > If underlying device is vdpa, mdev might be able to understand vdpa device and query from it and populate in mdev sysfs tree. Note that vdpa is bus independent so it can't work now and the support of mdev on top of vDPA have been rejected (and duplicated with vhost-vDPA). Thanks > > The vdpa tool I propose is usable even without mdevs. > vdpa tool's role is to create one or more vdpa devices and place on the "vdpa" bus which is the lowest layer here. > Additionally this tool let user query virtqueue stats, db stats. > When a user creates vdpa net device, user may need to configure features of the vdpa device such as VIRTIO_NET_F_MAC, default VIRTIO_NET_F_MTU. > These are vdpa level features, attributes. Mdev is layer above it. > >> Alex >> (https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. >> com%2Fmdevctl%2Fmdevctl&data=02%7C01%7Cparav%40nvidia.com%7C >> 0c2691d430304f5ea11308d843f2d84e%7C43083d15727340c1b7db39efd9ccc17 >> a%7C0%7C0%7C637334057571911357&sdata=KxH7PwxmKyy9JODut8BWr >> LQyOBylW00%2Fyzc4rEvjUvA%3D&reserved=0). >> > Sorry for above link mangling. Our mail server is still transitioning due to company acquisition. > > I am less familiar on below points to comment. > >> hi All, >> could we decide that sysfs is the interface that every VFIO vendor driver needs >> to provide in order to support vfio live migration, otherwise the userspace >> management tool would not list the device into the compatible list? >> >> if that's true, let's move to the standardizing of the sysfs interface. >> (1) content >> common part: (must) >> - software_version: (in major.minor.bugfix scheme) >> - device_api: vfio-pci or vfio-ccw ... >> - type: mdev type for mdev device or >> a signature for physical device which is a counterpart for >> mdev type. >> >> device api specific part: (must) >> - pci id: pci id of mdev parent device or pci id of physical pci >> device (device_api is vfio-pci) >> - subchannel_type (device_api is vfio-ccw) >> >> vendor driver specific part: (optional) >> - aggregator >> - chpid_type >> - remote_url >> >> NOTE: vendors are free to add attributes in this part with a restriction that this >> attribute is able to be configured with the same name in sysfs too. e.g. >> for aggregator, there must be a sysfs attribute in device node >> /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180- >> 078a62063ab1/intel_vgpu/aggregator, >> so that the userspace tool is able to configure the target device according to >> source device's aggregator attribute. >> >> >> (2) where and structure >> proposal 1: >> |- [path to device] >> |--- migration >> | |--- self >> | | |-software_version >> | | |-device_api >> | | |-type >> | | |-[pci_id or subchannel_type] >> | | |- >> | |--- compatible >> | | |-software_version >> | | |-device_api >> | | |-type >> | | |-[pci_id or subchannel_type] >> | | |- >> multiple compatible is allowed. >> attributes should be ASCII text files, preferably with only one value per file. >> >> >> proposal 2: use bin_attribute. >> |- [path to device] >> |--- migration >> | |--- self >> | |--- compatible >> >> so we can continue use multiline format. e.g. >> cat compatible >> software_version=0.1.0 >> device_api=vfio_pci >> type=i915-GVTg_V5_{val1:int:1,2,4,8} >> pci_id=80865963 >> aggregator={val1}/2 >> >> Thanks >> Yan From harishkumarivaturi at gmail.com Wed Aug 19 17:13:14 2020 From: harishkumarivaturi at gmail.com (HARISH KUMAR Ivaturi) Date: Wed, 19 Aug 2020 19:13:14 +0200 Subject: OpenStack with Nginx Message-ID: Hi I am Harish Kumar, Master Student at BTH, Karlskrona, Sweden. I am working on my Master thesis at BTH and my thesis topic is Performance evaluation of OpenStack with HTTP/3. I have successfully built curl and nginx with HTTP/3 support and I am performing some commands using curl for generating tokens so i could access the services of OpenStack. OpenStack relies with the Apache web server and I could not get any results using Nginx HTTP/3 . I would like to ask if there is any official documentation on OpenStack relying with Nginx?, I have searched in the internet reg. this info but could not get any, I would like to use nginx instead of apache web server , so I could get some results by performing curl and commands and nginx web server (with http/3 support). Please let me know and if there is any content please share with me. I hope you have understood this. It would be helpful for my Master Thesis. BR Harish Kumar -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.williamson at redhat.com Wed Aug 19 17:50:21 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Wed, 19 Aug 2020 11:50:21 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819033035.GA21172@joy-OptiPlex-7040> References: <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> Message-ID: <20200819115021.004427a3@x1.home> On Wed, 19 Aug 2020 11:30:35 +0800 Yan Zhao wrote: > On Tue, Aug 18, 2020 at 09:39:24AM +0000, Parav Pandit wrote: > > Hi Cornelia, > > > > > From: Cornelia Huck > > > Sent: Tuesday, August 18, 2020 3:07 PM > > > To: Daniel P. Berrangé > > > Cc: Jason Wang ; Yan Zhao > > > ; kvm at vger.kernel.org; libvir-list at redhat.com; > > > qemu-devel at nongnu.org; Kirti Wankhede ; > > > eauger at redhat.com; xin-ran.wang at intel.com; corbet at lwn.net; openstack- > > > discuss at lists.openstack.org; shaohe.feng at intel.com; kevin.tian at intel.com; > > > Parav Pandit ; jian-feng.ding at intel.com; > > > dgilbert at redhat.com; zhenyuw at linux.intel.com; hejie.xu at intel.com; > > > bao.yumeng at zte.com.cn; Alex Williamson ; > > > eskultet at redhat.com; smooney at redhat.com; intel-gvt- > > > dev at lists.freedesktop.org; Jiri Pirko ; > > > dinechin at redhat.com; devel at ovirt.org > > > Subject: Re: device compatibility interface for live migration with assigned > > > devices > > > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > > we actually can also retrieve the same information through sysfs, > > > > > .e.g > > > > > > > > > > |- [path to device] > > > > > |--- migration > > > > > | |--- self > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > | |--- compatible > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > - Attribute is coupled with kobject > > > > > > Is that really that bad? You have the device with an embedded kobject > > > anyway, and you can just put things into an attribute group? > > > > > > [Also, I think that self/compatible split in the example makes things > > > needlessly complex. Shouldn't semantic versioning and matching already > > > cover nearly everything? I would expect very few cases that are more > > > complex than that. Maybe the aggregation stuff, but I don't think we need > > > that self/compatible split for that, either.] > > > > > > > > > > > > > All of above seems unnecessary. > > > > > > > > > > Another point, as we discussed in another thread, it's really hard > > > > > to make sure the above API work for all types of devices and > > > > > frameworks. So having a vendor specific API looks much better. > > > > > > > > > > From the POV of userspace mgmt apps doing device compat checking / > > > > > migration, we certainly do NOT want to use different vendor > > > > > specific APIs. We want to have an API that can be used / controlled in a > > > standard manner across vendors. > > > > > > > > > > Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a > > > > > long debate on sysfs vs devlink). So if we go with sysfs, at least two > > > > > APIs needs to be supported ... > > > > > > > > NB, I was not questioning devlink vs sysfs directly. If devlink is > > > > related to netlink, I can't say I'm enthusiastic as IMKE sysfs is > > > > easier to deal with. I don't know enough about devlink to have much of an > > > opinion though. > > > > The key point was that I don't want the userspace APIs we need to deal > > > > with to be vendor specific. > > > > > > From what I've seen of devlink, it seems quite nice; but I understand why > > > sysfs might be easier to deal with (especially as there's likely already a lot of > > > code using it.) > > > > > > I understand that some users would like devlink because it is already widely > > > used for network drivers (and some others), but I don't think the majority of > > > devices used with vfio are network (although certainly a lot of them are.) > > > > > > > > > > > What I care about is that we have a *standard* userspace API for > > > > performing device compatibility checking / state migration, for use by > > > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > > > vendor specific code paths. > > > > > > > > If there is vendor specific stuff on the side, that's fine as we can > > > > ignore that, but the core functionality for device compat / migration > > > > needs to be standardized. > > > > > > To summarize: > > > - choose one of sysfs or devlink > > > - have a common interface, with a standardized way to add > > > vendor-specific attributes > > > ? > > > > Please refer to my previous email which has more example and details. > hi Parav, > the example is based on a new vdpa tool running over netlink, not based > on devlink, right? > For vfio migration compatibility, we have to deal with both mdev and physical > pci devices, I don't think it's a good idea to write a new tool for it, given > we are able to retrieve the same info from sysfs and there's already an > mdevctl from Alex (https://github.com/mdevctl/mdevctl). > > hi All, > could we decide that sysfs is the interface that every VFIO vendor driver > needs to provide in order to support vfio live migration, otherwise the > userspace management tool would not list the device into the compatible > list? > > if that's true, let's move to the standardizing of the sysfs interface. > (1) content > common part: (must) > - software_version: (in major.minor.bugfix scheme) > - device_api: vfio-pci or vfio-ccw ... > - type: mdev type for mdev device or > a signature for physical device which is a counterpart for > mdev type. > > device api specific part: (must) > - pci id: pci id of mdev parent device or pci id of physical pci > device (device_api is vfio-pci) As noted previously, the parent PCI ID should not matter for an mdev device, if a vendor has a dependency on matching the parent device PCI ID, that's a vendor specific restriction. An mdev device can also expose a vfio-pci device API without the parent device being PCI. For a physical PCI device, shouldn't the PCI ID be encompassed in the signature? Thanks, Alex > - subchannel_type (device_api is vfio-ccw) > > vendor driver specific part: (optional) > - aggregator > - chpid_type > - remote_url > > NOTE: vendors are free to add attributes in this part with a > restriction that this attribute is able to be configured with the same > name in sysfs too. e.g. > for aggregator, there must be a sysfs attribute in device node > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > so that the userspace tool is able to configure the target device > according to source device's aggregator attribute. > > > (2) where and structure > proposal 1: > |- [path to device] > |--- migration > | |--- self > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > | |--- compatible > | | |-software_version > | | |-device_api > | | |-type > | | |-[pci_id or subchannel_type] > | | |- > multiple compatible is allowed. > attributes should be ASCII text files, preferably with only one value > per file. > > > proposal 2: use bin_attribute. > |- [path to device] > |--- migration > | |--- self > | |--- compatible > > so we can continue use multiline format. e.g. > cat compatible > software_version=0.1.0 > device_api=vfio_pci > type=i915-GVTg_V5_{val1:int:1,2,4,8} > pci_id=80865963 > aggregator={val1}/2 > > Thanks > Yan > From sean.mcginnis at gmx.com Wed Aug 19 18:32:40 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 19 Aug 2020 13:32:40 -0500 Subject: OpenStack with Nginx In-Reply-To: References: Message-ID: On 8/19/20 12:13 PM, HARISH KUMAR Ivaturi wrote: > Hi > I am Harish Kumar, Master Student at BTH, Karlskrona, Sweden. I am > working on my Master thesis at BTH and my thesis topic is Performance > evaluation of OpenStack with HTTP/3. Welcome Harish! That should be interesting to see the results of your evaluation. I hope you will share that with the community once you complete your research. > > I have successfully built curl and nginx with HTTP/3 support and I am > performing some commands using curl for generating tokens so i could > access the services of OpenStack. > OpenStack relies with the Apache web server and I could not get any > results using Nginx HTTP/3 . I would like to ask if there is any > official documentation on OpenStack relying with Nginx?, I have > searched in the internet reg. this info but could not get any, I would > like to use nginx instead of apache web server , so I could get some > results by performing curl and commands and nginx web server (with > http/3 support). Please let me know and if there is any content please > share with me. I hope you have understood this. It would be helpful > for my Master Thesis. > I haven't really done anything with HTTP/3, but from what I understand, it just changes the transport to use QUIC. So that should be pretty transparent as far as the OpenStack services are concerned. We don't have any documentation that I know of. Unless someone has done some of their own testing and has some notes they can share. I think the main thing here would be just setting up nginx to use the uWSGI apps rather than Apache. This seems like a promising article that walks through configuring nginx: https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-20-04 That specifically references flask, so just keep in mind that most OpenStack services do not use that part of the tutorial. Cinder has some old notes from when we were first looking at running behind Apache. Those can be found here: https://docs.openstack.org/cinder/latest/contributor/api.apache.html But you may just need to look at the existing Apache configuration and figure out what to change to do the equivalent under nginx. Good luck! Sean From whayutin at redhat.com Thu Aug 20 02:08:48 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 19 Aug 2020 20:08:48 -0600 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: <1597847922905.32607@binero.com> References: <1597847922905.32607@binero.com> Message-ID: On Wed, Aug 19, 2020 at 8:40 AM Tobias Urdin wrote: > Big +1 from an outsider :)) > > > Best regards > > Tobias > > > ------------------------------ > *From:* Rabi Mishra > *Sent:* Wednesday, August 19, 2020 3:37 PM > *To:* Emilien Macchi > *Cc:* openstack-discuss > *Subject:* Re: [tripleo] Proposing Takashi Kajinami to be core on > puppet-tripleo > > +1 > > On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi wrote: > >> Hi people, >> >> If you don't know Takashi yet, he has been involved in the Puppet >> OpenStack project and helped *a lot* in its maintenance (and by maintenance >> I mean not-funny-work). When our community was getting smaller and smaller, >> he joined us and our review velicity went back to eleven. He became a core >> maintainer very quickly and we're glad to have him onboard. >> >> He's also been involved in taking care of puppet-tripleo for a few months >> and I believe he has more than enough knowledge on the module to provide >> core reviews and be part of the core maintainer group. I also noticed his >> amount of contribution (bug fixes, improvements, reviews, etc) in other >> TripleO repos and I'm confident he'll make his road to be core in TripleO >> at some point. For now I would like him to propose him to be core in >> puppet-tripleo. >> >> As usual, any feedback is welcome but in the meantime I want to thank >> Takashi for his work in TripleO and we're super happy to have new >> contributors! >> >> Thanks, >> -- >> Emilien Macchi >> > > > -- > Regards, > Rabi Mishra > > +1, thanks for your contributions Takashi! -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Thu Aug 20 08:22:06 2020 From: eblock at nde.ag (Eugen Block) Date: Thu, 20 Aug 2020 08:22:06 +0000 Subject: [neutron] Disable dhcp drop rule In-Reply-To: <20200819164211.Horde.jx_dhmZz16BL7k9bIumarOA@webmail.nde.ag> References: <20200819133616.Horde.zhXC_mhe4RdzjbP4Shl1M45@webmail.nde.ag> <4ea4eb17-0373-e1ab-6f45-c35cb67723e0@nemebean.com> <20200819164211.Horde.jx_dhmZz16BL7k9bIumarOA@webmail.nde.ag> Message-ID: <20200820082206.Horde.cXRYpICP4lCwZzX-6gHKj-q@webmail.nde.ag> Hi, just a quick follow-up on this: disabling port_security only on the specified port works as expected. Although this is still not an optimal solution we can live with it for now. Thanks again and best regards, Eugen Zitat von Eugen Block : > That sounds promising, thank you! I had noticed that option but > didn’t have a chance to look closer into it. > I’ll try that tomorrow. > > Thanks for the tip! > > Zitat von Ben Nemec : > >> On 8/19/20 8:36 AM, Eugen Block wrote: >>> Hi *, >>> >>> we recently upgraded our Ocata Cloud to Train and also switched >>> from linuxbridge to openvswitch. >>> >>> One of our instances within the cloud works as DHCP server and to >>> make that work we had to comment the respective part in this file >>> on the compute node the instance was running on: >>> >>> /usr/lib/python2.7/site-packages/neutron/agent/linux/iptables_firewall.py >>> >>> >>> Now we tried the same in >>> >>> /usr/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py >>> /usr/lib/python3.6/site-packages/neutron/agent/linux/iptables_firewall.py >>> >>> but restarting openstack-neutron-openvswitch-agent.service didn't >>> drop that rule, the DHCP reply didn't get through. To continue >>> with our work we just dropped it manually, so we get by, but since >>> there have been a couple of years between Ocata and Train, is >>> there any smoother or better way to achieve this? This seems to be >>> a reoccuring request but I couldn't find any updates on this >>> topic. Maybe someone here can shed some light? Is there more to >>> change than those two files I mentioned? >> >> You might try disabling port-security on the instance's port. >> That's what we use in OVB to allow a DHCP server in an instance now. >> >> neutron port-update [port-id] --port_security_enabled=False >> >> That will drop all port security for that instance, not just the >> DHCP rule, but on the other hand it leaves the DHCP rule in place >> for any instances you don't want running DHCP servers. >> >>> >>> Any pointers are highly appreciated! >>> >>> Best regards, >>> Eugen >>> >>> From dtantsur at redhat.com Thu Aug 20 08:54:17 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Thu, 20 Aug 2020 10:54:17 +0200 Subject: [ironic] RFC: deprecate the iSCSI deploy interface? Message-ID: Hi all, Side note for those lacking context: this proposal concerns deprecating one of the ironic deploy interfaces detailed in https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It does not affect the boot-from-iSCSI feature. I would like to propose deprecating and removing the 'iscsi' deploy interface over the course of the next 2 cycles. The reasons are: 1) The iSCSI deploy is a source of occasional cryptic bugs when a target cannot be discovered or mounted properly. 2) Its security is questionable: I don't think we even use authentication. 3) Operators confusion: right now we default to the iSCSI deploy but pretty much direct everyone who cares about scalability or security to the 'direct' deploy. 4) Cost of maintenance: our feature set is growing, our team - not so much. iscsi_deploy.py is 800 lines of code that can be removed, and some dependencies that can be dropped as well. As far as I can remember, we've kept the iSCSI deploy for two reasons: 1) The direct deploy used to require Glance with Swift backend. The recently added [agent]image_download_source option allows caching and serving images via the ironic's HTTP server, eliminating this problem. I guess we'll have to switch to 'http' by default for this option to keep the out-of-box experience. 2) Memory footprint of the direct deploy. With the raw images streaming we no longer have to cache the downloaded images in the agent memory, removing this problem as well (I'm not even sure how much of a problem it is in 2020, even my phone has 4GiB of RAM). If this proposal is accepted, I suggest to execute it as follows: Victoria release: 1) Put an early deprecation warning in the release notes. 2) Announce the future change of the default value for [agent]image_download_source. W release: 3) Change [agent]image_download_source to 'http' by default. 4) Remove iscsi from the default enabled_deploy_interfaces and move it to the back of the supported list (effectively making direct deploy the default). X release: 5) Remove the iscsi deploy code from both ironic and IPA. Thoughts, opinions, suggestions? Dmitry -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjeanner at redhat.com Thu Aug 20 12:22:59 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Thu, 20 Aug 2020 14:22:59 +0200 Subject: [tripleo] Moving tripleo-ansible-inventory script to tripleo-common? In-Reply-To: References: <0e91db84-723b-d14b-d654-fdc74a0a42eb@redhat.com> Message-ID: <0290a47a-4e84-76a1-e3aa-b0993191f6ff@redhat.com> On 8/18/20 10:03 AM, Cédric Jeanneret wrote: > > > On 8/18/20 9:53 AM, Rabi Mishra wrote: >> >> >> On Tue, Aug 18, 2020 at 1:07 PM Cédric Jeanneret > > wrote: >> >> Hello there! >> >> I'm wondering if we could move the "tripleo-ansible-inventory" script >> from the tripleo-validations repo to tripleo-common. >> >> >> TBH, I don't know the history, but it would be better if we remove all >> scripts from tripleo-common and use it just as a utility library (now >> that Mistral is gone). Most of the existing scripts probably have an >> existing command in tripleoclient. We can implement  missing ones >> including "tripleo-ansible-inventory" in python-tripleoclient. hm, we can't really replace it imho, since it's used as a "dynamic inventory" for ansible directly. The best thing we can probably do is: - add "--os-cloud" option support[1] - move this script to tripleoclient (I agree with you regarding tripleo-common) Once this is done, *maybe* we can move things to tripleoclient itself, but we'll need to do something in order to keep that script in place anyway... [1] Thanks Mathieu :) https://review.opendev.org/747140 > > would probably be better to implement it directly in tripleoclient imho. > In any cases, it has nothing to do in tripleo-validations... > > I can't connect to launchpad, they are having some auth issue, I can't > create an RFE there :(. > >> >> >> The main motivation here is to make things consistent: >> - that script calls content from tripleo-common, nothing from >> tripleo-validations. >> - that script isn't only for the validations, so it makes more sense to >> install it via tripleo-common >> - in fact, we should probably push that inventory thing as an `openstack >> tripleo' sub-command, but that's another story >> >> So, is there any opposition to this proposal? >> >> Cheers, >> >> C. >> >> >> -- >> Cédric Jeanneret (He/Him/His) >> Sr. Software Engineer - OpenStack Platform >> Deployment Framework TC >> Red Hat EMEA >> https://www.redhat.com/ >> >> >> >> -- >> Regards, >> Rabi Mishra >> > -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From root.mch at gmail.com Thu Aug 20 12:46:25 2020 From: root.mch at gmail.com (=?UTF-8?Q?=C4=B0zzettin_Erdem?=) Date: Thu, 20 Aug 2020 15:46:25 +0300 Subject: [MURANO] Murano Class error when try to deploy WordPress APP Message-ID: Hello everyone, WordPress needs Mysql, HTTP and Zabbix Server/Agent. These apps run individually with succes but when I try to deploy WordPress App on Murano it gives the error about Apache HTTP that mentioned below. How can I fix this? Do you have any suggestions? Error: http://paste.openstack.org/show/796980/ http://paste.openstack.org/show/796983/ (cont.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rajini.Karthik at Dell.com Thu Aug 20 13:25:09 2020 From: Rajini.Karthik at Dell.com (Karthik, Rajini) Date: Thu, 20 Aug 2020 13:25:09 +0000 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: <1597847922905.32607@binero.com> Message-ID: +1 . Rajini From: Wesley Hayutin Sent: Wednesday, August 19, 2020 9:09 PM To: openstack-discuss Cc: Emilien Macchi Subject: Re: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo [EXTERNAL EMAIL] On Wed, Aug 19, 2020 at 8:40 AM Tobias Urdin > wrote: Big +1 from an outsider :)) Best regards Tobias ________________________________ From: Rabi Mishra > Sent: Wednesday, August 19, 2020 3:37 PM To: Emilien Macchi Cc: openstack-discuss Subject: Re: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo +1 On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi > wrote: Hi people, If you don't know Takashi yet, he has been involved in the Puppet OpenStack project and helped *a lot* in its maintenance (and by maintenance I mean not-funny-work). When our community was getting smaller and smaller, he joined us and our review velicity went back to eleven. He became a core maintainer very quickly and we're glad to have him onboard. He's also been involved in taking care of puppet-tripleo for a few months and I believe he has more than enough knowledge on the module to provide core reviews and be part of the core maintainer group. I also noticed his amount of contribution (bug fixes, improvements, reviews, etc) in other TripleO repos and I'm confident he'll make his road to be core in TripleO at some point. For now I would like him to propose him to be core in puppet-tripleo. As usual, any feedback is welcome but in the meantime I want to thank Takashi for his work in TripleO and we're super happy to have new contributors! Thanks, -- Emilien Macchi -- Regards, Rabi Mishra +1, thanks for your contributions Takashi! -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Thu Aug 20 00:18:10 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 08:18:10 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819115021.004427a3@x1.home> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819115021.004427a3@x1.home> Message-ID: <20200820001810.GD21172@joy-OptiPlex-7040> On Wed, Aug 19, 2020 at 11:50:21AM -0600, Alex Williamson wrote: <...> > > > > > What I care about is that we have a *standard* userspace API for > > > > > performing device compatibility checking / state migration, for use by > > > > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > > > > vendor specific code paths. > > > > > > > > > > If there is vendor specific stuff on the side, that's fine as we can > > > > > ignore that, but the core functionality for device compat / migration > > > > > needs to be standardized. > > > > > > > > To summarize: > > > > - choose one of sysfs or devlink > > > > - have a common interface, with a standardized way to add > > > > vendor-specific attributes > > > > ? > > > > > > Please refer to my previous email which has more example and details. > > hi Parav, > > the example is based on a new vdpa tool running over netlink, not based > > on devlink, right? > > For vfio migration compatibility, we have to deal with both mdev and physical > > pci devices, I don't think it's a good idea to write a new tool for it, given > > we are able to retrieve the same info from sysfs and there's already an > > mdevctl from Alex (https://github.com/mdevctl/mdevctl). > > > > hi All, > > could we decide that sysfs is the interface that every VFIO vendor driver > > needs to provide in order to support vfio live migration, otherwise the > > userspace management tool would not list the device into the compatible > > list? > > > > if that's true, let's move to the standardizing of the sysfs interface. > > (1) content > > common part: (must) > > - software_version: (in major.minor.bugfix scheme) > > - device_api: vfio-pci or vfio-ccw ... > > - type: mdev type for mdev device or > > a signature for physical device which is a counterpart for > > mdev type. > > > > device api specific part: (must) > > - pci id: pci id of mdev parent device or pci id of physical pci > > device (device_api is vfio-pci) > > As noted previously, the parent PCI ID should not matter for an mdev > device, if a vendor has a dependency on matching the parent device PCI > ID, that's a vendor specific restriction. An mdev device can also > expose a vfio-pci device API without the parent device being PCI. For > a physical PCI device, shouldn't the PCI ID be encompassed in the > signature? Thanks, > you are right. I need to put the PCI ID as a vendor specific field. I didn't do that because I wanted all fields in vendor specific to be configurable by management tools, so they can configure the target device according to the value of a vendor specific field even they don't know the meaning of the field. But maybe they can just ignore the field when they can't find a matching writable field to configure the target. Thanks Yan > > - subchannel_type (device_api is vfio-ccw) > > > > vendor driver specific part: (optional) > > - aggregator > > - chpid_type > > - remote_url > > > > NOTE: vendors are free to add attributes in this part with a > > restriction that this attribute is able to be configured with the same > > name in sysfs too. e.g. > > for aggregator, there must be a sysfs attribute in device node > > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > > so that the userspace tool is able to configure the target device > > according to source device's aggregator attribute. > > > > > > (2) where and structure > > proposal 1: > > |- [path to device] > > |--- migration > > | |--- self > > | | |-software_version > > | | |-device_api > > | | |-type > > | | |-[pci_id or subchannel_type] > > | | |- > > | |--- compatible > > | | |-software_version > > | | |-device_api > > | | |-type > > | | |-[pci_id or subchannel_type] > > | | |- > > multiple compatible is allowed. > > attributes should be ASCII text files, preferably with only one value > > per file. > > > > > > proposal 2: use bin_attribute. > > |- [path to device] > > |--- migration > > | |--- self > > | |--- compatible > > > > so we can continue use multiline format. e.g. > > cat compatible > > software_version=0.1.0 > > device_api=vfio_pci > > type=i915-GVTg_V5_{val1:int:1,2,4,8} > > pci_id=80865963 > > aggregator={val1}/2 > > > > Thanks > > Yan > > > From yan.y.zhao at intel.com Thu Aug 20 00:39:22 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 08:39:22 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200818113652.5d81a392.cohuck@redhat.com> References: <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> Message-ID: <20200820003922.GE21172@joy-OptiPlex-7040> On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > On Tue, 18 Aug 2020 10:16:28 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > |- [path to device] > > > |--- migration > > > | |--- self > > > | | |---device_api > > > | | |---mdev_type > > > | | |---software_version > > > | | |---device_id > > > | | |---aggregator > > > | |--- compatible > > > | | |---device_api > > > | | |---mdev_type > > > | | |---software_version > > > | | |---device_id > > > | | |---aggregator > > > > > > > > > Yes but: > > > > > > - You need one file per attribute (one syscall for one attribute) > > > - Attribute is coupled with kobject > > Is that really that bad? You have the device with an embedded kobject > anyway, and you can just put things into an attribute group? > > [Also, I think that self/compatible split in the example makes things > needlessly complex. Shouldn't semantic versioning and matching already > cover nearly everything? I would expect very few cases that are more > complex than that. Maybe the aggregation stuff, but I don't think we > need that self/compatible split for that, either.] Hi Cornelia, The reason I want to declare compatible list of attributes is that sometimes it's not a simple 1:1 matching of source attributes and target attributes as I demonstrated below, source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), (mdev_type i915-GVTg_V5_8 + aggregator 4) and aggragator may be just one of such examples that 1:1 matching does not fit. So, we explicitly list out self/compatible attributes, and management tools only need to check if self attributes is contained compatible attributes. or do you mean only compatible list is enough, and the management tools need to find out self list by themselves? But I think provide a self list is easier for management tools. Thanks Yan From smooney at redhat.com Thu Aug 20 01:29:07 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Aug 2020 02:29:07 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820003922.GE21172@joy-OptiPlex-7040> References: <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> Message-ID: <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > On Tue, 18 Aug 2020 10:16:28 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > |- [path to device] > > > > |--- migration > > > > | |--- self > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > | |--- compatible > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > > > > > > > > > Yes but: > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > - Attribute is coupled with kobject > > > > Is that really that bad? You have the device with an embedded kobject > > anyway, and you can just put things into an attribute group? > > > > [Also, I think that self/compatible split in the example makes things > > needlessly complex. Shouldn't semantic versioning and matching already > > cover nearly everything? I would expect very few cases that are more > > complex than that. Maybe the aggregation stuff, but I don't think we > > need that self/compatible split for that, either.] > > Hi Cornelia, > > The reason I want to declare compatible list of attributes is that > sometimes it's not a simple 1:1 matching of source attributes and target attributes > as I demonstrated below, > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > (mdev_type i915-GVTg_V5_8 + aggregator 4) the way you are doing the nameing is till really confusing by the way if this has not already been merged in the kernel can you chagne the mdev so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device currently you need to deived the aggratod by the number at the end of the mdev type to figure out how much of the phsicial device is being used with is a very unfridly api convention the way aggrator are being proposed in general is not really someting i like but i thin this at least is something that should be able to correct. with the complexity in the mdev type name + aggrator i suspect that this will never be support in openstack nova directly requireing integration via cyborg unless we can pre partion the device in to mdevs staicaly and just ignore this. this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee taht how aggreator work will be portable across vendors genericly. > > and aggragator may be just one of such examples that 1:1 matching does not > fit. for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915-GVTg_large that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user level with some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an aggreateion of multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for each of the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only the aggreated mdev to the instance. the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms on top of it even to boot them let alone migrate them. > > So, we explicitly list out self/compatible attributes, and management > tools only need to check if self attributes is contained compatible > attributes. > > or do you mean only compatible list is enough, and the management tools > need to find out self list by themselves? > But I think provide a self list is easier for management tools. > > Thanks > Yan > From alex.williamson at redhat.com Thu Aug 20 03:13:45 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Wed, 19 Aug 2020 21:13:45 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820001810.GD21172@joy-OptiPlex-7040> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819115021.004427a3@x1.home> <20200820001810.GD21172@joy-OptiPlex-7040> Message-ID: <20200819211345.0d9daf03@x1.home> On Thu, 20 Aug 2020 08:18:10 +0800 Yan Zhao wrote: > On Wed, Aug 19, 2020 at 11:50:21AM -0600, Alex Williamson wrote: > <...> > > > > > > What I care about is that we have a *standard* userspace API for > > > > > > performing device compatibility checking / state migration, for use by > > > > > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > > > > > vendor specific code paths. > > > > > > > > > > > > If there is vendor specific stuff on the side, that's fine as we can > > > > > > ignore that, but the core functionality for device compat / migration > > > > > > needs to be standardized. > > > > > > > > > > To summarize: > > > > > - choose one of sysfs or devlink > > > > > - have a common interface, with a standardized way to add > > > > > vendor-specific attributes > > > > > ? > > > > > > > > Please refer to my previous email which has more example and details. > > > hi Parav, > > > the example is based on a new vdpa tool running over netlink, not based > > > on devlink, right? > > > For vfio migration compatibility, we have to deal with both mdev and physical > > > pci devices, I don't think it's a good idea to write a new tool for it, given > > > we are able to retrieve the same info from sysfs and there's already an > > > mdevctl from Alex (https://github.com/mdevctl/mdevctl). > > > > > > hi All, > > > could we decide that sysfs is the interface that every VFIO vendor driver > > > needs to provide in order to support vfio live migration, otherwise the > > > userspace management tool would not list the device into the compatible > > > list? > > > > > > if that's true, let's move to the standardizing of the sysfs interface. > > > (1) content > > > common part: (must) > > > - software_version: (in major.minor.bugfix scheme) > > > - device_api: vfio-pci or vfio-ccw ... > > > - type: mdev type for mdev device or > > > a signature for physical device which is a counterpart for > > > mdev type. > > > > > > device api specific part: (must) > > > - pci id: pci id of mdev parent device or pci id of physical pci > > > device (device_api is vfio-pci) > > > > As noted previously, the parent PCI ID should not matter for an mdev > > device, if a vendor has a dependency on matching the parent device PCI > > ID, that's a vendor specific restriction. An mdev device can also > > expose a vfio-pci device API without the parent device being PCI. For > > a physical PCI device, shouldn't the PCI ID be encompassed in the > > signature? Thanks, > > > you are right. I need to put the PCI ID as a vendor specific field. > I didn't do that because I wanted all fields in vendor specific to be > configurable by management tools, so they can configure the target device > according to the value of a vendor specific field even they don't know > the meaning of the field. > But maybe they can just ignore the field when they can't find a matching > writable field to configure the target. If fields can be ignored, what's the point of reporting them? Seems it's no longer a requirement. Thanks, Alex > > > - subchannel_type (device_api is vfio-ccw) > > > > > > vendor driver specific part: (optional) > > > - aggregator > > > - chpid_type > > > - remote_url > > > > > > NOTE: vendors are free to add attributes in this part with a > > > restriction that this attribute is able to be configured with the same > > > name in sysfs too. e.g. > > > for aggregator, there must be a sysfs attribute in device node > > > /sys/devices/pci0000:00/0000:00:02.0/882cc4da-dede-11e7-9180-078a62063ab1/intel_vgpu/aggregator, > > > so that the userspace tool is able to configure the target device > > > according to source device's aggregator attribute. > > > > > > > > > (2) where and structure > > > proposal 1: > > > |- [path to device] > > > |--- migration > > > | |--- self > > > | | |-software_version > > > | | |-device_api > > > | | |-type > > > | | |-[pci_id or subchannel_type] > > > | | |- > > > | |--- compatible > > > | | |-software_version > > > | | |-device_api > > > | | |-type > > > | | |-[pci_id or subchannel_type] > > > | | |- > > > multiple compatible is allowed. > > > attributes should be ASCII text files, preferably with only one value > > > per file. > > > > > > > > > proposal 2: use bin_attribute. > > > |- [path to device] > > > |--- migration > > > | |--- self > > > | |--- compatible > > > > > > so we can continue use multiline format. e.g. > > > cat compatible > > > software_version=0.1.0 > > > device_api=vfio_pci > > > type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > pci_id=80865963 > > > aggregator={val1}/2 > > > > > > Thanks > > > Yan > > > > > > From alex.williamson at redhat.com Thu Aug 20 03:22:34 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Wed, 19 Aug 2020 21:22:34 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820003922.GE21172@joy-OptiPlex-7040> References: <20200805093338.GC30485@joy-OptiPlex-7040> <20200805105319.GF2177@nanopsycho> <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> Message-ID: <20200819212234.223667b3@x1.home> On Thu, 20 Aug 2020 08:39:22 +0800 Yan Zhao wrote: > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > On Tue, 18 Aug 2020 10:16:28 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > |- [path to device] > > > > |--- migration > > > > | |--- self > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > | |--- compatible > > > > | | |---device_api > > > > | | |---mdev_type > > > > | | |---software_version > > > > | | |---device_id > > > > | | |---aggregator > > > > > > > > > > > > Yes but: > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > - Attribute is coupled with kobject > > > > Is that really that bad? You have the device with an embedded kobject > > anyway, and you can just put things into an attribute group? > > > > [Also, I think that self/compatible split in the example makes things > > needlessly complex. Shouldn't semantic versioning and matching already > > cover nearly everything? I would expect very few cases that are more > > complex than that. Maybe the aggregation stuff, but I don't think we > > need that self/compatible split for that, either.] > Hi Cornelia, > > The reason I want to declare compatible list of attributes is that > sometimes it's not a simple 1:1 matching of source attributes and target attributes > as I demonstrated below, > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > and aggragator may be just one of such examples that 1:1 matching does not > fit. If you're suggesting that we need a new 'compatible' set for every aggregation, haven't we lost the purpose of aggregation? For example, rather than having N mdev types to represent all the possible aggregation values, we have a single mdev type with N compatible migration entries, one for each possible aggregation value. BTW, how do we have multiple compatible directories? compatible0001, compatible0002? Thanks, Alex From yan.y.zhao at intel.com Thu Aug 20 03:09:51 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 11:09:51 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819211345.0d9daf03@x1.home> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819115021.004427a3@x1.home> <20200820001810.GD21172@joy-OptiPlex-7040> <20200819211345.0d9daf03@x1.home> Message-ID: <20200820030951.GA24121@joy-OptiPlex-7040> On Wed, Aug 19, 2020 at 09:13:45PM -0600, Alex Williamson wrote: > On Thu, 20 Aug 2020 08:18:10 +0800 > Yan Zhao wrote: > > > On Wed, Aug 19, 2020 at 11:50:21AM -0600, Alex Williamson wrote: > > <...> > > > > > > > What I care about is that we have a *standard* userspace API for > > > > > > > performing device compatibility checking / state migration, for use by > > > > > > > QEMU/libvirt/ OpenStack, such that we can write code without countless > > > > > > > vendor specific code paths. > > > > > > > > > > > > > > If there is vendor specific stuff on the side, that's fine as we can > > > > > > > ignore that, but the core functionality for device compat / migration > > > > > > > needs to be standardized. > > > > > > > > > > > > To summarize: > > > > > > - choose one of sysfs or devlink > > > > > > - have a common interface, with a standardized way to add > > > > > > vendor-specific attributes > > > > > > ? > > > > > > > > > > Please refer to my previous email which has more example and details. > > > > hi Parav, > > > > the example is based on a new vdpa tool running over netlink, not based > > > > on devlink, right? > > > > For vfio migration compatibility, we have to deal with both mdev and physical > > > > pci devices, I don't think it's a good idea to write a new tool for it, given > > > > we are able to retrieve the same info from sysfs and there's already an > > > > mdevctl from Alex (https://github.com/mdevctl/mdevctl). > > > > > > > > hi All, > > > > could we decide that sysfs is the interface that every VFIO vendor driver > > > > needs to provide in order to support vfio live migration, otherwise the > > > > userspace management tool would not list the device into the compatible > > > > list? > > > > > > > > if that's true, let's move to the standardizing of the sysfs interface. > > > > (1) content > > > > common part: (must) > > > > - software_version: (in major.minor.bugfix scheme) > > > > - device_api: vfio-pci or vfio-ccw ... > > > > - type: mdev type for mdev device or > > > > a signature for physical device which is a counterpart for > > > > mdev type. > > > > > > > > device api specific part: (must) > > > > - pci id: pci id of mdev parent device or pci id of physical pci > > > > device (device_api is vfio-pci) > > > > > > As noted previously, the parent PCI ID should not matter for an mdev > > > device, if a vendor has a dependency on matching the parent device PCI > > > ID, that's a vendor specific restriction. An mdev device can also > > > expose a vfio-pci device API without the parent device being PCI. For > > > a physical PCI device, shouldn't the PCI ID be encompassed in the > > > signature? Thanks, > > > > > you are right. I need to put the PCI ID as a vendor specific field. > > I didn't do that because I wanted all fields in vendor specific to be > > configurable by management tools, so they can configure the target device > > according to the value of a vendor specific field even they don't know > > the meaning of the field. > > But maybe they can just ignore the field when they can't find a matching > > writable field to configure the target. > > > If fields can be ignored, what's the point of reporting them? Seems > it's no longer a requirement. Thanks, > sorry about the confusion. I mean this condition: about to migrate, openstack searches if there are existing matching MDEVs, if yes, i.e. all common/vendor specific fields match, then just create a VM with the matching target MDEV. (in this condition, the PCI ID field is not ignored); if not, openstack tries to create one MDEV according to mdev_type, and configures MDEV according to the vendor specific attributes. as PCI ID is not a configurable field, it just ignore the field. Thanks Yan From yan.y.zhao at intel.com Thu Aug 20 03:16:21 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 11:16:21 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200819212234.223667b3@x1.home> References: <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> Message-ID: <20200820031621.GA24997@joy-OptiPlex-7040> On Wed, Aug 19, 2020 at 09:22:34PM -0600, Alex Williamson wrote: > On Thu, 20 Aug 2020 08:39:22 +0800 > Yan Zhao wrote: > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > |- [path to device] > > > > > |--- migration > > > > > | |--- self > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > | |--- compatible > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > - Attribute is coupled with kobject > > > > > > Is that really that bad? You have the device with an embedded kobject > > > anyway, and you can just put things into an attribute group? > > > > > > [Also, I think that self/compatible split in the example makes things > > > needlessly complex. Shouldn't semantic versioning and matching already > > > cover nearly everything? I would expect very few cases that are more > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > need that self/compatible split for that, either.] > > Hi Cornelia, > > > > The reason I want to declare compatible list of attributes is that > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > as I demonstrated below, > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > and aggragator may be just one of such examples that 1:1 matching does not > > fit. > > If you're suggesting that we need a new 'compatible' set for every > aggregation, haven't we lost the purpose of aggregation? For example, > rather than having N mdev types to represent all the possible > aggregation values, we have a single mdev type with N compatible > migration entries, one for each possible aggregation value. BTW, how do > we have multiple compatible directories? compatible0001, > compatible0002? Thanks, > do you think the bin_attribute I proposed yesterday good? Then we can have a single compatible with a variable in the mdev_type and aggregator. mdev_type=i915-GVTg_V5_{val1:int:2,4,8} aggregator={val1}/2 Thanks Yan From yan.y.zhao at intel.com Thu Aug 20 04:01:16 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 12:01:16 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> References: <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> Message-ID: <20200820040116.GB24121@joy-OptiPlex-7040> On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > |- [path to device] > > > > > |--- migration > > > > > | |--- self > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > | |--- compatible > > > > > | | |---device_api > > > > > | | |---mdev_type > > > > > | | |---software_version > > > > > | | |---device_id > > > > > | | |---aggregator > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > - Attribute is coupled with kobject > > > > > > Is that really that bad? You have the device with an embedded kobject > > > anyway, and you can just put things into an attribute group? > > > > > > [Also, I think that self/compatible split in the example makes things > > > needlessly complex. Shouldn't semantic versioning and matching already > > > cover nearly everything? I would expect very few cases that are more > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > need that self/compatible split for that, either.] > > > > Hi Cornelia, > > > > The reason I want to declare compatible list of attributes is that > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > as I demonstrated below, > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > the way you are doing the nameing is till really confusing by the way > if this has not already been merged in the kernel can you chagne the mdev > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > how much of the phsicial device is being used with is a very unfridly api convention > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > is something that should be able to correct. > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > in openstack nova directly requireing integration via cyborg unless we can pre partion the > device in to mdevs staicaly and just ignore this. > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > taht how aggreator work will be portable across vendors genericly. > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > fit. > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > hi Sean, I understand it's hard for openstack. but 1:N is always meaningful. e.g. if source device 1 has cap A, it is compatible to device 2: cap A, device 3: cap A+B, device 4: cap A+B+C .... to allow openstack to detect it correctly, in compatible list of device 2, we would say compatible cap is A; device 3, compatible cap is A or A+B; device 4, compatible cap is A or A+B, or A+B+C; then if openstack finds device A's self cap A is contained in compatible cap of device 2/3/4, it can migrate device 1 to device 2,3,4. conversely, device 1's compatible cap is only A, so it is able to migrate device 2 to device 1, and it is not able to migrate device 3/4 to device 1. Thanks Yan > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915-GVTg_large > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user level with > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an aggreateion of > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for each of > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only the > aggreated mdev to the instance. > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms on top > of it even to boot them let alone migrate them. > > > > So, we explicitly list out self/compatible attributes, and management > > tools only need to check if self attributes is contained compatible > > attributes. > > > > or do you mean only compatible list is enough, and the management tools > > need to find out self list by themselves? > > But I think provide a self list is easier for management tools. > > > > Thanks > > Yan > > > From smooney at redhat.com Thu Aug 20 05:16:28 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Aug 2020 06:16:28 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820040116.GB24121@joy-OptiPlex-7040> References: <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> <20200820040116.GB24121@joy-OptiPlex-7040> Message-ID: On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote: > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > |- [path to device] > > > > > > |--- migration > > > > > > | |--- self > > > > > > | | |---device_api > > > > > > | | |---mdev_type > > > > > > | | |---software_version > > > > > > | | |---device_id > > > > > > | | |---aggregator > > > > > > | |--- compatible > > > > > > | | |---device_api > > > > > > | | |---mdev_type > > > > > > | | |---software_version > > > > > > | | |---device_id > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > - Attribute is coupled with kobject > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > anyway, and you can just put things into an attribute group? > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > cover nearly everything? I would expect very few cases that are more > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > need that self/compatible split for that, either.] > > > > > > Hi Cornelia, > > > > > > The reason I want to declare compatible list of attributes is that > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > as I demonstrated below, > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > the way you are doing the nameing is till really confusing by the way > > if this has not already been merged in the kernel can you chagne the mdev > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > > how much of the phsicial device is being used with is a very unfridly api convention > > > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > > is something that should be able to correct. > > > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > > in openstack nova directly requireing integration via cyborg unless we can pre partion the > > device in to mdevs staicaly and just ignore this. > > > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > > taht how aggreator work will be portable across vendors genericly. > > > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > fit. > > > > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > > > > hi Sean, > I understand it's hard for openstack. but 1:N is always meaningful. > e.g. > if source device 1 has cap A, it is compatible to > device 2: cap A, > device 3: cap A+B, > device 4: cap A+B+C > .... > to allow openstack to detect it correctly, in compatible list of > device 2, we would say compatible cap is A; > device 3, compatible cap is A or A+B; > device 4, compatible cap is A or A+B, or A+B+C; > > then if openstack finds device A's self cap A is contained in compatible > cap of device 2/3/4, it can migrate device 1 to device 2,3,4. > > conversely, device 1's compatible cap is only A, > so it is able to migrate device 2 to device 1, and it is not able to > migrate device 3/4 to device 1. yes we build the palcement servce aroudn the idea of capablites as traits on resocue providres. which is why i originally asked if we coudl model compatibality with feature flags we can seaislyt model deivce as aupport A, A+B or A+B+C and then select hosts and evice based on that but the list of compatable deivce you are propsoeing hide this feature infomation which whould be what we are matching on. give me a lset of feature you want and list ting the feature avaiable on each device allow highre level ocestation to easily match the request to a host that can fulllfile it btu thave a set of other compatihble device does not help with that so if a simple list a capabliteis can be advertiese d and if we know tha two dievce with the same capablity are intercahangebale that is workabout i suspect that will not be the case however and it would onely work within a familay of mdevs that are closely related. which i think agian is an argument for not changeing the mdev type and at least intially only look at migatreion where the mdev type doee not change initally. > > Thanks > Yan > > > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915- > > GVTg_large > > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user level > > with > > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an aggreateion > > of > > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for each > > of > > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only the > > aggreated mdev to the instance. > > > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms on > > top > > of it even to boot them let alone migrate them. > > > > > > So, we explicitly list out self/compatible attributes, and management > > > tools only need to check if self attributes is contained compatible > > > attributes. > > > > > > or do you mean only compatible list is enough, and the management tools > > > need to find out self list by themselves? > > > But I think provide a self list is easier for management tools. > > > > > > Thanks > > > Yan > > > > > From yan.y.zhao at intel.com Thu Aug 20 06:27:25 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 20 Aug 2020 14:27:25 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> <20200820040116.GB24121@joy-OptiPlex-7040> Message-ID: <20200820062725.GB24997@joy-OptiPlex-7040> On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote: > On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote: > > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > > > |- [path to device] > > > > > > > |--- migration > > > > > > > | |--- self > > > > > > > | | |---device_api > > > > > > > | | |---mdev_type > > > > > > > | | |---software_version > > > > > > > | | |---device_id > > > > > > > | | |---aggregator > > > > > > > | |--- compatible > > > > > > > | | |---device_api > > > > > > > | | |---mdev_type > > > > > > > | | |---software_version > > > > > > > | | |---device_id > > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > > - Attribute is coupled with kobject > > > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > > anyway, and you can just put things into an attribute group? > > > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > > cover nearly everything? I would expect very few cases that are more > > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > > need that self/compatible split for that, either.] > > > > > > > > Hi Cornelia, > > > > > > > > The reason I want to declare compatible list of attributes is that > > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > > as I demonstrated below, > > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > > > the way you are doing the nameing is till really confusing by the way > > > if this has not already been merged in the kernel can you chagne the mdev > > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > > > > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > > > how much of the phsicial device is being used with is a very unfridly api convention > > > > > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > > > is something that should be able to correct. > > > > > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > > > in openstack nova directly requireing integration via cyborg unless we can pre partion the > > > device in to mdevs staicaly and just ignore this. > > > > > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > > > taht how aggreator work will be portable across vendors genericly. > > > > > > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > > fit. > > > > > > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > > > > > > > hi Sean, > > I understand it's hard for openstack. but 1:N is always meaningful. > > e.g. > > if source device 1 has cap A, it is compatible to > > device 2: cap A, > > device 3: cap A+B, > > device 4: cap A+B+C > > .... > > to allow openstack to detect it correctly, in compatible list of > > device 2, we would say compatible cap is A; > > device 3, compatible cap is A or A+B; > > device 4, compatible cap is A or A+B, or A+B+C; > > > > then if openstack finds device A's self cap A is contained in compatible > > cap of device 2/3/4, it can migrate device 1 to device 2,3,4. > > > > conversely, device 1's compatible cap is only A, > > so it is able to migrate device 2 to device 1, and it is not able to > > migrate device 3/4 to device 1. > > yes we build the palcement servce aroudn the idea of capablites as traits on resocue providres. > which is why i originally asked if we coudl model compatibality with feature flags > > we can seaislyt model deivce as aupport A, A+B or A+B+C > and then select hosts and evice based on that but > > the list of compatable deivce you are propsoeing hide this feature infomation which whould be what we are matching on. > > give me a lset of feature you want and list ting the feature avaiable on each device allow highre level ocestation to > easily match the request to a host that can fulllfile it btu thave a set of other compatihble device does not help with > that > > so if a simple list a capabliteis can be advertiese d and if we know tha two dievce with the same capablity are > intercahangebale that is workabout i suspect that will not be the case however and it would onely work within a familay > of mdevs that are closely related. which i think agian is an argument for not changeing the mdev type and at least > intially only look at migatreion where the mdev type doee not change initally. > sorry Sean, I don't understand your words completely. Please allow me to write it down in my words, and please confirm if my understanding is right. 1. you mean you agree on that each field is regarded as a trait, and openstack can compare by itself if source trait is a subset of target trait, right? e.g. source device field1=A1 field2=A2+B2 field3=A3 target device field1=A1+B1 field2=A2+B2 filed3=A3 then openstack sees that field1/2/3 in source is a subset of field1/2/3 in target, so it's migratable to target? 2. mdev_type + aggregator make it hard to achieve the above elegant solution, so it's best to avoid the combined comparing of mdev_type + aggregator. do I understand it correctly? 3. you don't like self list and compatible list, because it is hard for openstack to compare different traits? e.g. if we have self list and compatible list, then as below, openstack needs to compare if self field1/2/3 is a subset of compatible field 1/2/3. source device: self field1=A1 self field2=A2+B2 self field3=A3 compatible field1=A1 compatible field2=A2;B2;A2+B2; compatible field3=A3 target device: self field1=A1+B1 self field2=A2+B2 self field3=A3 compatible field1=A1;B1;A1+B1; compatible field2=A2;B2;A2+B2; compatible field3=A3 Thanks Yan > > > > > > > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > > > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > > > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > > > > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915- > > > GVTg_large > > > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > > > > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user level > > > with > > > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an aggreateion > > > of > > > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for each > > > of > > > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only the > > > aggreated mdev to the instance. > > > > > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms on > > > top > > > of it even to boot them let alone migrate them. > > > > > > > > So, we explicitly list out self/compatible attributes, and management > > > > tools only need to check if self attributes is contained compatible > > > > attributes. > > > > > > > > or do you mean only compatible list is enough, and the management tools > > > > need to find out self list by themselves? > > > > But I think provide a self list is easier for management tools. > > > > > > > > Thanks > > > > Yan > > > > > > > > > From cohuck at redhat.com Thu Aug 20 12:27:40 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Thu, 20 Aug 2020 14:27:40 +0200 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> <20200819081338.GC21172@joy-OptiPlex-7040> Message-ID: <20200820142740.6513884d.cohuck@redhat.com> On Wed, 19 Aug 2020 17:28:38 +0800 Jason Wang wrote: > On 2020/8/19 下午4:13, Yan Zhao wrote: > > On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: > >> On 2020/8/19 下午2:59, Yan Zhao wrote: > >>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: > >>>> On 2020/8/19 上午11:30, Yan Zhao wrote: > >>>>> hi All, > >>>>> could we decide that sysfs is the interface that every VFIO vendor driver > >>>>> needs to provide in order to support vfio live migration, otherwise the > >>>>> userspace management tool would not list the device into the compatible > >>>>> list? > >>>>> > >>>>> if that's true, let's move to the standardizing of the sysfs interface. > >>>>> (1) content > >>>>> common part: (must) > >>>>> - software_version: (in major.minor.bugfix scheme) > >>>> This can not work for devices whose features can be negotiated/advertised > >>>> independently. (E.g virtio devices) I thought the 'software_version' was supposed to describe kind of a 'protocol version' for the data we transmit? I.e., you add a new field, you bump the version number. > >>>> > >>> sorry, I don't understand here, why virtio devices need to use vfio interface? > >> > >> I don't see any reason that virtio devices can't be used by VFIO. Do you? > >> > >> Actually, virtio devices have been used by VFIO for many years: > >> > >> - passthrough a hardware virtio devices to userspace(VM) drivers > >> - using virtio PMD inside guest > >> > > So, what's different for it vs passing through a physical hardware via VFIO? > > > The difference is in the guest, the device could be either real hardware > or emulated ones. > > > > even though the features are negotiated dynamically, could you explain > > why it would cause software_version not work? > > > Virtio device 1 supports feature A, B, C > Virtio device 2 supports feature B, C, D > > So you can't migrate a guest from device 1 to device 2. And it's > impossible to model the features with versions. We're talking about the features offered by the device, right? Would it be sufficient to mandate that the target device supports the same features or a superset of the features supported by the source device? > > > > > > > >>> I think this thread is discussing about vfio related devices. > >>> > >>>>> - device_api: vfio-pci or vfio-ccw ... > >>>>> - type: mdev type for mdev device or > >>>>> a signature for physical device which is a counterpart for > >>>>> mdev type. > >>>>> > >>>>> device api specific part: (must) > >>>>> - pci id: pci id of mdev parent device or pci id of physical pci > >>>>> device (device_api is vfio-pci)API here. > >>>> So this assumes a PCI device which is probably not true. > >>>> > >>> for device_api of vfio-pci, why it's not true? > >>> > >>> for vfio-ccw, it's subchannel_type. > >> > >> Ok but having two different attributes for the same file is not good idea. > >> How mgmt know there will be a 3rd type? > > that's why some attributes need to be common. e.g. > > device_api: it's common because mgmt need to know it's a pci device or a > > ccw device. and the api type is already defined vfio.h. > > (The field is agreed by and actually suggested by Alex in previous mail) > > type: mdev_type for mdev. if mgmt does not understand it, it would not > > be able to create one compatible mdev device. > > software_version: mgmt can compare the major and minor if it understands > > this fields. > > > I think it would be helpful if you can describe how mgmt is expected to > work step by step with the proposed sysfs API. This can help people to > understand. My proposal would be: - check that device_api matches - check possible device_api specific attributes - check that type matches [I don't think the combination of mdev types and another attribute to determine compatibility is a good idea; actually, the current proposal confuses me every time I look at it] - check that software_version is compatible, assuming semantic versioning - check possible type-specific attributes > > Thanks for the patience. Since sysfs is uABI, when accepted, we need > support it forever. That's why we need to be careful. Nod. (...) From smooney at redhat.com Thu Aug 20 13:24:26 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 20 Aug 2020 14:24:26 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820062725.GB24997@joy-OptiPlex-7040> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> <20200820040116.GB24121@joy-OptiPlex-7040> <20200820062725.GB24997@joy-OptiPlex-7040> Message-ID: <47d216330e10152f0f5d27421da60a7b1c52e5f0.camel@redhat.com> On Thu, 2020-08-20 at 14:27 +0800, Yan Zhao wrote: > On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote: > > On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote: > > > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > > > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > > > > > |- [path to device] > > > > > > > > |--- migration > > > > > > > > | |--- self > > > > > > > > | | |---device_api > > > > > > > > | | |---mdev_type > > > > > > > > | | |---software_version > > > > > > > > | | |---device_id > > > > > > > > | | |---aggregator > > > > > > > > | |--- compatible > > > > > > > > | | |---device_api > > > > > > > > | | |---mdev_type > > > > > > > > | | |---software_version > > > > > > > > | | |---device_id > > > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > > > - Attribute is coupled with kobject > > > > > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > > > anyway, and you can just put things into an attribute group? > > > > > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > > > cover nearly everything? I would expect very few cases that are more > > > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > > > need that self/compatible split for that, either.] > > > > > > > > > > Hi Cornelia, > > > > > > > > > > The reason I want to declare compatible list of attributes is that > > > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > > > as I demonstrated below, > > > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > > > > > the way you are doing the nameing is till really confusing by the way > > > > if this has not already been merged in the kernel can you chagne the mdev > > > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > > > > > > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > > > > how much of the phsicial device is being used with is a very unfridly api convention > > > > > > > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > > > > is something that should be able to correct. > > > > > > > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > > > > in openstack nova directly requireing integration via cyborg unless we can pre partion the > > > > device in to mdevs staicaly and just ignore this. > > > > > > > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > > > > taht how aggreator work will be portable across vendors genericly. > > > > > > > > > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > > > fit. > > > > > > > > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > > > > > > > > > > hi Sean, > > > I understand it's hard for openstack. but 1:N is always meaningful. > > > e.g. > > > if source device 1 has cap A, it is compatible to > > > device 2: cap A, > > > device 3: cap A+B, > > > device 4: cap A+B+C > > > .... > > > to allow openstack to detect it correctly, in compatible list of > > > device 2, we would say compatible cap is A; > > > device 3, compatible cap is A or A+B; > > > device 4, compatible cap is A or A+B, or A+B+C; > > > > > > then if openstack finds device A's self cap A is contained in compatible > > > cap of device 2/3/4, it can migrate device 1 to device 2,3,4. > > > > > > conversely, device 1's compatible cap is only A, > > > so it is able to migrate device 2 to device 1, and it is not able to > > > migrate device 3/4 to device 1. > > > > yes we build the palcement servce aroudn the idea of capablites as traits on resocue providres. > > which is why i originally asked if we coudl model compatibality with feature flags > > > > we can seaislyt model deivce as aupport A, A+B or A+B+C > > and then select hosts and evice based on that but > > > > the list of compatable deivce you are propsoeing hide this feature infomation which whould be what we are matching > > on. > > > > give me a lset of feature you want and list ting the feature avaiable on each device allow highre level ocestation > > to > > easily match the request to a host that can fulllfile it btu thave a set of other compatihble device does not help > > with > > that > > > > so if a simple list a capabliteis can be advertiese d and if we know tha two dievce with the same capablity are > > intercahangebale that is workabout i suspect that will not be the case however and it would onely work within a > > familay > > of mdevs that are closely related. which i think agian is an argument for not changeing the mdev type and at least > > intially only look at migatreion where the mdev type doee not change initally. > > > > sorry Sean, I don't understand your words completely. > Please allow me to write it down in my words, and please confirm if my > understanding is right. > 1. you mean you agree on that each field is regarded as a trait, and > openstack can compare by itself if source trait is a subset of target trait, right? > e.g. > source device > field1=A1 > field2=A2+B2 > field3=A3 > > target device > field1=A1+B1 > field2=A2+B2 > filed3=A3 > > then openstack sees that field1/2/3 in source is a subset of field1/2/3 in > target, so it's migratable to target? yes this is basically how cpu feature work. if we see the host cpu on the dest is a supperset of the cpu feature used by the vm we know its safe to migrate. > > 2. mdev_type + aggregator make it hard to achieve the above elegant > solution, so it's best to avoid the combined comparing of mdev_type + aggregator. > do I understand it correctly? yes and no. one of the challange that mdevs pose right now is that sometiem mdev model independent resouces and sometimes multipe mdev types consume the same underlying resouces there is know way for openstack to know if i915-GVTg_V5_2 and i915-GVTg_V5_4 consume the same resouces or not. as such we cant do the accounting properly so i would much prefer to have just 1 mdev type i915-GVTg and which models the minimal allocatable unit and then say i want 4 of them comsed into 1 device then have a second mdev type that does that since what that means in pratice is we cannot trust the available_instances for a given mdev type as consuming a different mdev type might change it. aggrators makes that problem worse. which is why i siad i would prefer if instead of aggreator as prposed each consumable resouce was reported indepenedly as different mdev types and then we composed those like we would when bond ports creating an attachment or other logical aggration that refers to instance of mdevs of differing type which we expose as a singel mdev that is exposed to the guest. in a concreate example we might say create a aggreator of 64 cuda cores and 32 tensor cores and "bond them" or aggrate them as a single attachme mdev and provide that to a ml workload guest. a differnt guest could request 1 instace of the nvenc video encoder and one instance of the nvenc video decoder but no cuda or tensor for a video transcoding workload. if each of those componets are indepent mdev types and can be composed with that granularity then i think that approch is better then the current aggreator with vendor sepcific fileds. we can model the phsical device as being multipel nested resouces with different traits for each type of resouce and different capsities for the same. we can even model how many of the attachments/compositions can be done indepently if there is a limit on that. |- [parent physical device] |--- Vendor-specific-attributes [optional] |--- [mdev_supported_types] | |--- [] | | |--- create | | |--- name | | |--- available_instances | | |--- device_api | | |--- description | | |--- [devices] | |--- [] | | |--- create | | |--- name | | |--- available_instances | | |--- device_api | | |--- description | | |--- [devices] | |--- [] | |--- create | |--- name | |--- available_instances | |--- device_api | |--- description | |--- [devices] a benifit of this appoch is we would be the mdev types would not change on migration and we could jsut compuare a a simeple version stirgh and feature flag list to determin comaptiablity in a vendor neutral way. i dont nessisarly need to know what the vendeor flags mean just that the dest is a subset of the source and that the semaitic version numbers say the mdevs are compatible. > > 3. you don't like self list and compatible list, because it is hard for > openstack to compare different traits? > e.g. if we have self list and compatible list, then as below, openstack needs > to compare if self field1/2/3 is a subset of compatible field 1/2/3. currnetly we only use mdevs for vGPUs and in our documentaiton we tell customer to model the mdev_type as a trait and request it as a reuiqred trait. so for customer that are doing that today changing mdev types is not really an option. we would prefer that they request the feature they need instead of a spefic mdev type so we can select any that meets there needs for example we have a bunch of traits for cuda support https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda.py or driectx/vulkan/opengl https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/api.py these are closely analogous to cpu feature flag lix avx or sse https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/__init__.py#L16 so when it comes to compatiablities it would be ideal if you could express capablities as something like a cpu feature flag then we can eaisly model those as traits. > > source device: > self field1=A1 > self field2=A2+B2 > self field3=A3 > > compatible field1=A1 > compatible field2=A2;B2;A2+B2; > compatible field3=A3 > > > target device: > self field1=A1+B1 > self field2=A2+B2 > self field3=A3 > > compatible field1=A1;B1;A1+B1; > compatible field2=A2;B2;A2+B2; > compatible field3=A3 > > > Thanks > Yan > > > > > > > > > > > > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > > > > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > > > > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > > > > > > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915- > > > > GVTg_large > > > > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > > > > > > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user > > > > level > > > > with > > > > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an > > > > aggreateion > > > > of > > > > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for > > > > each > > > > of > > > > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only > > > > the > > > > aggreated mdev to the instance. > > > > > > > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms > > > > on > > > > top > > > > of it even to boot them let alone migrate them. > > > > > > > > > > So, we explicitly list out self/compatible attributes, and management > > > > > tools only need to check if self attributes is contained compatible > > > > > attributes. > > > > > > > > > > or do you mean only compatible list is enough, and the management tools > > > > > need to find out self list by themselves? > > > > > But I think provide a self list is easier for management tools. > > > > > > > > > > Thanks > > > > > Yan > > > > > > > > > > > > > From arnaud.morin at gmail.com Thu Aug 20 15:35:03 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 20 Aug 2020 15:35:03 +0000 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> Message-ID: <20200820153503.GY31915@sync> Hey all, TLDR: - Patch in [1] updated - Example of usage in [3] - Agree with fixing nova/rabbit/oslo but would like to keep this ping endpoint also - Totally agree with documentation needed Long: Thank you all for your review and for the great information you bring to that topic! First thing, we are not yet using that patch in production, but in testing/dev only for now (at OVH). But the plan is to use it in production ASAP. Also, we initially pushed that for neutron agent, that's why I missed the fact that nova already used the "ping" endpoint, sorry for that. Anyway, I dont care about the naming, so in latest patchset of [1], you will see that I changed the name of the endpoint following Ken Giusti suggestions. The bug reported in [2] looks very similar to what we saw. Thank you Sean for bringing that to attention in this thread. To detect this error, using the above "ping" endpoint in oslo, we can use a script like the one in [3] (sorry about it, I can write better python :p). As mentionned by Sean in a previous mail, I am calling effectively the topic "compute.host123456.sbg5.cloud.ovh.net" in "nova" exchange. My initial plan would be to identify topics related to a compute and do pings in all topics, to make sure that all of them are answering. I am not yet sure about how often and if this is a good plan btw. Anyway, the compute is reporting status as UP, but the ping is timeouting, which is exactly what I wanted to detect! I mostly agree with all your comments about the fact that this is a trick that we do as operator, and using the RPC bus is maybe not the best approach, but this is pragmatic and quite simple IMHO. What I also like in this solution is the fact that this is partialy outside of OpenStack: the endpoint is inside, but doing the ping is external. Monitoring OpenStack is not always easy, and sometimes we struggle on finding the root cause of some issues. Having such endpoint allow us to monitor OpenStack from an external point of view, but still in a deeper way. It's like a probe in your car telling you that even if you are still running, your engine is off :) Still, making sure that this bug is fixed by doing some work on (rabbit|oslo.messaging|nova|whatever} is the best thing to do. However, IMO, this does not prevent this rpc ping endpoint from existing. Last, but not least, I totally agree about documenting this, but also adding some documentation on how to configure rabbit and OpenStack services in a way that fit operator needs. There are plenty of parameters which could be tweaked on both OpenStack and rabbit side. IMO, we need to explain a little bit more what are the impact of setting a specific parameter to a given value. For example, in another discussion ([4]), we were talking about "durable" queues in rabbit. We manage to find that if we enable HA, we should also enable durability of queues. Anyway that's another topic, and this is also something we discuss in large-scale group. Thank you all, [1] https://review.opendev.org/#/c/735385/ [2] https://bugs.launchpad.net/nova/+bug/1854992 [3] http://paste.openstack.org/show/796990/ [4] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016362.html -- Arnaud Morin On 13.08.20 - 17:17, Ken Giusti wrote: > On Thu, Aug 13, 2020 at 12:30 PM Ben Nemec wrote: > > > > > > > On 8/13/20 11:07 AM, Sean Mooney wrote: > > >> I think it's probably > > >> better to provide a well-defined endpoint for them to talk to rather > > >> than have everyone implement their own slightly different RPC ping > > >> mechanism. The docs for this feature should be very explicit that this > > >> is the only thing external code should be calling. > > > ya i think that is a good approch. > > > i would still prefer if people used say middelware to add a service ping > > admin api endpoint > > > instead of driectly calling the rpc endpoint to avoid exposing rabbitmq > > but that is out of scope of this discussion. > > > > Completely agree. In the long run I would like to see this replaced with > > better integrated healthchecking in OpenStack, but we've been talking > > about that for years and have made minimal progress. > > > > > > > >> > > >>> > > >>> so if this does actully detect somethign we can otherwise detect and > > the use cases involves using it within > > >>> the openstack services not form an external source then i think that > > is fine but we proably need to use another > > >>> name (alive? status?) or otherewise modify nova so that there is no > > conflict. > > >>>> > > >> > > >> If I understand your analysis of the bug correctly, this would have > > >> caught that type of outage after all since the failure was asymmetric. > > > am im not sure > > > it might yes looking at https://review.opendev.org/#/c/735385/6 > > > its not clear to me how the endpoint is invoked. is it doing a topic > > send or a direct send? > > > to detech the failure you would need to invoke a ping on the compute > > service and that ping would > > > have to been encured on the to nova topic exchante with a routing key of > > compute. > > > > > > if the compute topic queue was broken either because it was nolonger > > bound to the correct topic or due to some other > > > rabbitmq error then you woudl either get a message undeilverbale error > > of some kind with the mandaroy flag or likely a > > > timeout without the mandaroty flag. so if the ping would be routed usign > > a topic too compute. > > > then yes it would find this. > > > > > > although we can also detech this ourselves and fix it using the > > mandatory flag i think by just recreating the queue wehn > > > it extis but we get an undeliverable message, at least i think we can > > rabbit is not my main are of expertiese so it > > > woudl be nice is someone that know more about it can weigh in on that. > > > > I pinged Ken this morning to take a look at that. He should be able to > > tell us whether it's a good idea or crazy talk. :-) > > > > Like I can tell the difference between crazy and good ideas. Ben I thought > you knew me better. ;) > > As discussed you can enable the mandatory flag on a per RPCClient instance, > for example: > > _topts = oslo_messaging.TransportOptions(at_least_once=True) > client = oslo_messaging.RPCClient(self.transport, > self.target, > timeout=conf.timeout, > version_cap=conf.target_version, > transport_options=_topts).prepare() > > This will cause an rpc call/cast to fail if rabbitmq cannot find a queue > for the rpc request message [note the difference between 'queuing the > message' and 'having the message consumed' - the mandatory flag has nothing > to do with whether or not the message is eventually consumed]. > > Keep in mind that there may be some cases where having no active consumers > is ok and you do not want to get a delivery failure exception - > specifically fanout or perhaps cast. Depends on the use case. If there > are fanout use cases that fail or degrade if all present services don't get > a message then the mandatory flag will not detect an error if a subset of > the bindings are lost. > > My biggest concern with this type of failure (lost binding) is that > apparently the consumer is none the wiser when it happens. Without some > sort of event issued by rabbitmq the RPC server cannot detect this problem > and take corrective actions (or at least I cannot think of any ATM). > > > -- > Ken Giusti (kgiusti at gmail.com) From juliaashleykreger at gmail.com Thu Aug 20 15:49:14 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Thu, 20 Aug 2020 08:49:14 -0700 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: I'm having a sense of deja vu! Because of the way the mechanics work, the iscsi deploy driver is in an unfortunate position of being harder to troubleshoot and diagnose failures. Which basically means we've not been able to really identify common failures and add logic to handle them appropriately, like we are able to with a tcp socket and file download. Based on this alone, I think it makes a solid case for us to seriously consider deprecation. Overall, I'm +1 for the proposal and I believe over two cycles is the right way to go. I suspect we're going to have lots of push back from the TripleO community because there has been resistance to change their default usage in the past. As such I'm adding them to the subject so hopefully they will be at least aware. I guess my other worry is operators who already have a substantial operational infrastructure investment built around the iscsi deploy interface. I wonder why they didn't use direct, but maybe they have all migrated in the past ?5? years. This could just be a non-concern in reality, I'm just not sure. Of course, if someone is willing to step up and make the iscsi deployment interface their primary focus, that also shifts the discussion to making direct the default interface? -Julia On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur wrote: > > Hi all, > > Side note for those lacking context: this proposal concerns deprecating one of the ironic deploy interfaces detailed in https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It does not affect the boot-from-iSCSI feature. > > I would like to propose deprecating and removing the 'iscsi' deploy interface over the course of the next 2 cycles. The reasons are: > 1) The iSCSI deploy is a source of occasional cryptic bugs when a target cannot be discovered or mounted properly. > 2) Its security is questionable: I don't think we even use authentication. > 3) Operators confusion: right now we default to the iSCSI deploy but pretty much direct everyone who cares about scalability or security to the 'direct' deploy. > 4) Cost of maintenance: our feature set is growing, our team - not so much. iscsi_deploy.py is 800 lines of code that can be removed, and some dependencies that can be dropped as well. > > As far as I can remember, we've kept the iSCSI deploy for two reasons: > 1) The direct deploy used to require Glance with Swift backend. The recently added [agent]image_download_source option allows caching and serving images via the ironic's HTTP server, eliminating this problem. I guess we'll have to switch to 'http' by default for this option to keep the out-of-box experience. > 2) Memory footprint of the direct deploy. With the raw images streaming we no longer have to cache the downloaded images in the agent memory, removing this problem as well (I'm not even sure how much of a problem it is in 2020, even my phone has 4GiB of RAM). > > If this proposal is accepted, I suggest to execute it as follows: > Victoria release: > 1) Put an early deprecation warning in the release notes. > 2) Announce the future change of the default value for [agent]image_download_source. > W release: > 3) Change [agent]image_download_source to 'http' by default. > 4) Remove iscsi from the default enabled_deploy_interfaces and move it to the back of the supported list (effectively making direct deploy the default). > X release: > 5) Remove the iscsi deploy code from both ironic and IPA. > > Thoughts, opinions, suggestions? > > Dmitry From sean.mcginnis at gmx.com Thu Aug 20 16:39:37 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 20 Aug 2020 11:39:37 -0500 Subject: [all] Proposed Wallaby cycle schedule In-Reply-To: <0083db2a-0ef7-99fa-0c45-fd170f7d7902@gmx.com> References: <2e56de68-c416-e3ea-f3da-caaf9399287d@gmx.com> <0083db2a-0ef7-99fa-0c45-fd170f7d7902@gmx.com> Message-ID: >> The current thinking is it will likely take place in May (nothing is >> set, just an educated guess, so please don't use that for any other >> planning). So for the sake of figuring out the release schedule, we are >> targeting a release date in early May. Hopefully this will then align >> well with event plans. >> >> I have a proposed release schedule up for review here: >> >> https://review.opendev.org/#/c/744729/ ... > > As an alternative option, I have proposed a 26 week option: > > https://review.opendev.org/#/c/745911/ > The majority of support so far has been for the 26 week schedule, with the only -1 votes going to the 29 week option. This is a final call to raise any objects or issues with either option. Unless something changes, we plan to approve the 26 week schedule early next week. Thanks! Sean From elfosardo at gmail.com Thu Aug 20 17:05:49 2020 From: elfosardo at gmail.com (Riccardo Pittau) Date: Thu, 20 Aug 2020 19:05:49 +0200 Subject: [ironic] next Victoria meetup In-Reply-To: References: Message-ID: Hello again! Friendly reminder about the vote to schedule the next Ironic Virtual Meetup! Since a lot of people are on vacation in this period, we've decided to postpone the final day for the vote to next Wednesday August 26 And we have an etherpad now! https://etherpad.opendev.org/p/Ironic-Victoria-midcycle Feel free to propose topics, we'll discuss also about the upcoming PTG and Forum. Thanks! A si biri Riccardo On Mon, Aug 17, 2020 at 6:29 PM Riccardo Pittau wrote: > Hello everyone! > > The time for the next Ironic virtual meetup is close! > It will be an opportunity to review what has been done in the last months, > exchange ideas and plan for the time before the upcoming victoria release, > with an eye towards the future. > > We're aiming to have the virtual meetup the first week of September > (Monday August 31 - Friday September 4) and split it in two days, with one > two-hours slot per day. > Please vote for your best time slots here: > https://doodle.com/poll/pi4x3kuxamf4nnpu > > We're planning to leave the vote open at least for the entire week until > Friday August 21, so to have enough time to announce the final slots and > planning early next week. > > Thanks! > > A si biri > > Riccardo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Thu Aug 20 17:09:11 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 20 Aug 2020 19:09:11 +0200 Subject: [cloudkitty] Resuming CloudKitty IRC meetings Message-ID: Hello, We are resuming IRC meetings for the CloudKitty project using the existing calendar schedule. The first meeting will be on Monday August 24 at 1400 UTC in #cloudkitty on freenode, then every two weeks. Everyone is welcome: contributors, users, and anyone who would like to contribute or use CloudKitty but doesn't know how to get started. The agenda is available on Etherpad [1]. The meeting description [2] was stating that meetings were on the first and third Monday of the month, but the calendar schedule was using odd weeks. I've submitted a change [3] to synchronise the description: let's meet on odd weeks instead of using a month-based schedule. Thanks in advance to all of you helping to keep the project going. Pierre Riteau (priteau) [1] https://etherpad.opendev.org/p/cloudkitty-meeting-topics [2] http://eavesdrop.openstack.org/#CloudKitty_Team_Meeting [3] https://review.opendev.org/#/c/747256/ From dev.faz at gmail.com Thu Aug 20 17:16:17 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 20 Aug 2020 19:16:17 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <20200818120708.GV31915@sync> References: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <20200818120708.GV31915@sync> Message-ID: Hi, just another idea: Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages). Anyone already doing this? I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea. Seems we have to wait issue to happen again - what - hopefully - never happens :) Fabian Arnaud Morin schrieb am Di., 18. Aug. 2020, 14:07: > Hey all, > > About the vexxhost strategy to use only one rabbit server and manage HA > through > rabbit. > Do you plan to do the same for MariaDB/MySQL? > > -- > Arnaud Morin > > On 14.08.20 - 18:45, Fabian Zimmermann wrote: > > Hi, > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > one rabbitmq Container per Service. Just the kubernetes self healing is > > used as "ha" for rabbitmq. > > > > That seems to match with my finding: run rabbitmq standalone and use an > > external system to restart rabbitmq if required. > > > > Fabian > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, > 16:59: > > > > > Fabian, > > > > > > what do you mean? > > > > > > >> I think vexxhost is running (1) with their openstack-operator - for > > > reasons. > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > wrote: > > > > > > > > Hello again, > > > > > > > > just a short update about the results of my tests. > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > 1. without durable-queues and without replication - just one > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > 2. durable-queues and replication > > > > > > > > Any other combination of these settings leads to more or less issues > with > > > > > > > > * broken / non working bindings > > > > * broken queues > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > reasons. > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > replication but without durable-queues. > > > > > > > > May someone point me to the best way to document these findings to > some > > > official doc? > > > > I think a lot of installations out there will run into issues if - > under > > > load - a node fails. > > > > > > > > Fabian > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > dev.faz at gmail.com>: > > > >> > > > >> Hi, > > > >> > > > >> just did some short tests today in our test-environment (without > > > durable queues and without replication): > > > >> > > > >> * started a rally task to generate some load > > > >> * kill-9-ed rabbitmq on one node > > > >> * rally task immediately stopped and the cloud (mostly) stopped > working > > > >> > > > >> after some debugging i found (again) exchanges which had bindings to > > > queues, but these bindings didnt forward any msgs. > > > >> Wrote a small script to detect these broken bindings and will now > check > > > if this is "reproducible" > > > >> > > > >> then I will try "durable queues" and "durable queues with > replication" > > > to see if this helps. Even if I would expect > > > >> rabbitmq should be able to handle this without these "hidden broken > > > bindings" > > > >> > > > >> This just FYI. > > > >> > > > >> Fabian > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Aug 20 17:18:38 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 20 Aug 2020 12:18:38 -0500 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: References: Message-ID: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> On 8/20/20 12:09 PM, Pierre Riteau wrote: > Hello, > > We are resuming IRC meetings for the CloudKitty project using the > existing calendar schedule. > The first meeting will be on Monday August 24 at 1400 UTC in > #cloudkitty on freenode, then every two weeks. > > Everyone is welcome: contributors, users, and anyone who would like to > contribute or use CloudKitty but doesn't know how to get started. > The agenda is available on Etherpad [1]. > > The meeting description [2] was stating that meetings were on the > first and third Monday of the month, but the calendar schedule was > using odd weeks. I've submitted a change [3] to synchronise the > description: let's meet on odd weeks instead of using a month-based > schedule. With what you said above, this is actually taking place on even weeks now. Can you clarify - is it a one-off that you will be holding this on the 4th Monday next week? Or do you actually intend to switch these to even weeks (in which case that patch is incorrect)? > > Thanks in advance to all of you helping to keep the project going. > > Pierre Riteau (priteau) > > [1] https://etherpad.opendev.org/p/cloudkitty-meeting-topics > [2] http://eavesdrop.openstack.org/#CloudKitty_Team_Meeting > [3] https://review.opendev.org/#/c/747256/ > From pierre at stackhpc.com Thu Aug 20 17:54:16 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 20 Aug 2020 19:54:16 +0200 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> References: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> Message-ID: On Thu, 20 Aug 2020 at 19:27, Sean McGinnis wrote: > > On 8/20/20 12:09 PM, Pierre Riteau wrote: > > Hello, > > > > We are resuming IRC meetings for the CloudKitty project using the > > existing calendar schedule. > > The first meeting will be on Monday August 24 at 1400 UTC in > > #cloudkitty on freenode, then every two weeks. > > > > Everyone is welcome: contributors, users, and anyone who would like to > > contribute or use CloudKitty but doesn't know how to get started. > > The agenda is available on Etherpad [1]. > > > > The meeting description [2] was stating that meetings were on the > > first and third Monday of the month, but the calendar schedule was > > using odd weeks. I've submitted a change [3] to synchronise the > > description: let's meet on odd weeks instead of using a month-based > > schedule. > > With what you said above, this is actually taking place on even weeks now. Unless I am mistaken, a schedule based on the Nth day of month is not fixed to even or odd weeks. For example, in June the first and third Monday were in weeks 23 and 25 (odd), but since July they take place in even weeks. What I am proposing is we disregard the existing description and go by the frequency defined in the yaml file, which is biweekly-odd. It means that people who have already imported the calendar invite will have the correct date (although not the right description). And we'll always have two weeks between each meeting, instead of sometimes three. > Can you clarify - is it a one-off that you will be holding this on the > 4th Monday next week? Or do you actually intend to switch these to even > weeks (in which case that patch is incorrect)? > > > > > Thanks in advance to all of you helping to keep the project going. > > > > Pierre Riteau (priteau) > > > > [1] https://etherpad.opendev.org/p/cloudkitty-meeting-topics > > [2] http://eavesdrop.openstack.org/#CloudKitty_Team_Meeting > > [3] https://review.opendev.org/#/c/747256/ > > > From fungi at yuggoth.org Thu Aug 20 18:24:39 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 20 Aug 2020 18:24:39 +0000 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: References: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> Message-ID: <20200820182438.lrqp5baym5o37nhl@yuggoth.org> On 2020-08-20 19:54:16 +0200 (+0200), Pierre Riteau wrote: [...] > Unless I am mistaken, a schedule based on the Nth day of month is not > fixed to even or odd weeks. > For example, in June the first and third Monday were in weeks 23 and > 25 (odd), but since July they take place in even weeks. [...] Correct, this is documented in the README.rst for the yaml2ical library, which irc-meetings uses to render this metadata into scheduling: https://opendev.org/opendev/yaml2ical#user-content-frequencies "biweekly-odd Occurs on odd weeks (ISOweek % 2 == 1)" "Odd/Even and week numbers are based on the ISO week number. ISO weeks can be checked with %V in GNU date(1)" I think some people have assumed it's even/odd week numbers in a month rather than even/odd week numbers counting from the epoch. Technically we have frequencies like "first-tuesday" and "third-tuesday" to address specific week numbers in a month rather than truly alternating weeks. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sean.mcginnis at gmx.com Thu Aug 20 18:37:53 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 20 Aug 2020 13:37:53 -0500 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: <20200820182438.lrqp5baym5o37nhl@yuggoth.org> References: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> <20200820182438.lrqp5baym5o37nhl@yuggoth.org> Message-ID: > "biweekly-odd Occurs on odd weeks (ISOweek % 2 == 1)" > > "Odd/Even and week numbers are based on the ISO week number. ISO > weeks can be checked with %V in GNU date(1)" > > I think some people have assumed it's even/odd week numbers in a > month rather than even/odd week numbers counting from the epoch. > Technically we have frequencies like "first-tuesday" and > "third-tuesday" to address specific week numbers in a month rather > than truly alternating weeks. Based on the existing description, that appears to be the intent. So it had been the first and third weeks of the month, so therefore the odd weeks. But then we at least have a mismatch between phrasing used and what yaml2ical generates, so the description does need to be updated. From fungi at yuggoth.org Thu Aug 20 19:22:31 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 20 Aug 2020 19:22:31 +0000 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: References: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> <20200820182438.lrqp5baym5o37nhl@yuggoth.org> Message-ID: <20200820192231.2cv3x3qqm5oaanaz@yuggoth.org> On 2020-08-20 13:37:53 -0500 (-0500), Sean McGinnis wrote: > > > "biweekly-odd Occurs on odd weeks (ISOweek % 2 == 1)" > > > > "Odd/Even and week numbers are based on the ISO week number. ISO > > weeks can be checked with %V in GNU date(1)" > > > > I think some people have assumed it's even/odd week numbers in a > > month rather than even/odd week numbers counting from the epoch. > > Technically we have frequencies like "first-tuesday" and > > "third-tuesday" to address specific week numbers in a month rather > > than truly alternating weeks. > Based on the existing description, that appears to be the intent. So it > had been the first and third weeks of the month, so therefore the odd > weeks. But then we at least have a mismatch between phrasing used and > what yaml2ical generates, so the description does need to be updated. And even I'm easily confused by this, as evidenced by the fact that above I confused epoch weeks with ISO annual week counts. As the README.rst suggests, if you have a "53-week" year then you get ISO odd weeks back to back between the end of that year and the start of the next, or a pair of ISO even weeks which are three weeks apart instead of two. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From arnaud.morin at gmail.com Thu Aug 20 19:28:40 2020 From: arnaud.morin at gmail.com (Arnaud MORIN) Date: Thu, 20 Aug 2020 21:28:40 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <20200818120708.GV31915@sync> Message-ID: Hello, Are you doing that using alternate exchange ? I started configuring it in our env but not yet finished. Cheers, Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann a écrit : > Hi, > > just another idea: > > Rabbitmq is able to count undelivered messages. We could use this > information to detect the broken bindings (causing undeliverable messages). > > Anyone already doing this? > > I currently don't have a way to reproduce the broken bindings, so I'm > unable to proof the idea. > > Seems we have to wait issue to happen again - what - hopefully - never > happens :) > > Fabian > > Arnaud Morin schrieb am Di., 18. Aug. 2020, > 14:07: > >> Hey all, >> >> About the vexxhost strategy to use only one rabbit server and manage HA >> through >> rabbit. >> Do you plan to do the same for MariaDB/MySQL? >> >> -- >> Arnaud Morin >> >> On 14.08.20 - 18:45, Fabian Zimmermann wrote: >> > Hi, >> > >> > i read somewhere that vexxhosts kubernetes openstack-Operator is running >> > one rabbitmq Container per Service. Just the kubernetes self healing is >> > used as "ha" for rabbitmq. >> > >> > That seems to match with my finding: run rabbitmq standalone and use an >> > external system to restart rabbitmq if required. >> > >> > Fabian >> > >> > Satish Patel schrieb am Fr., 14. Aug. 2020, >> 16:59: >> > >> > > Fabian, >> > > >> > > what do you mean? >> > > >> > > >> I think vexxhost is running (1) with their openstack-operator - for >> > > reasons. >> > > >> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann >> > > wrote: >> > > > >> > > > Hello again, >> > > > >> > > > just a short update about the results of my tests. >> > > > >> > > > I currently see 2 ways of running openstack+rabbitmq >> > > > >> > > > 1. without durable-queues and without replication - just one >> > > rabbitmq-process which gets (somehow) restarted if it fails. >> > > > 2. durable-queues and replication >> > > > >> > > > Any other combination of these settings leads to more or less >> issues with >> > > > >> > > > * broken / non working bindings >> > > > * broken queues >> > > > >> > > > I think vexxhost is running (1) with their openstack-operator - for >> > > reasons. >> > > > >> > > > I added [kolla], because kolla-ansible is installing rabbitmq with >> > > replication but without durable-queues. >> > > > >> > > > May someone point me to the best way to document these findings to >> some >> > > official doc? >> > > > I think a lot of installations out there will run into issues if - >> under >> > > load - a node fails. >> > > > >> > > > Fabian >> > > > >> > > > >> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < >> > > dev.faz at gmail.com>: >> > > >> >> > > >> Hi, >> > > >> >> > > >> just did some short tests today in our test-environment (without >> > > durable queues and without replication): >> > > >> >> > > >> * started a rally task to generate some load >> > > >> * kill-9-ed rabbitmq on one node >> > > >> * rally task immediately stopped and the cloud (mostly) stopped >> working >> > > >> >> > > >> after some debugging i found (again) exchanges which had bindings >> to >> > > queues, but these bindings didnt forward any msgs. >> > > >> Wrote a small script to detect these broken bindings and will now >> check >> > > if this is "reproducible" >> > > >> >> > > >> then I will try "durable queues" and "durable queues with >> replication" >> > > to see if this helps. Even if I would expect >> > > >> rabbitmq should be able to handle this without these "hidden broken >> > > bindings" >> > > >> >> > > >> This just FYI. >> > > >> >> > > >> Fabian >> > > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Aug 20 20:02:45 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 20 Aug 2020 15:02:45 -0500 Subject: [release] Release countdown for week R-7 Aug 24 - 28 Message-ID: <20200820200245.GA212631@sm-workstation> Development Focus ----------------- We are entering the last weeks of the Victoria development cycle. From now until the final release, we'll send a countdown email like this every week. It's probably a good time for teams to take stock of their library and client work that needs to be completed yet. The non-client library freeze is coming up, followed closely by the client lib freeze. Please plan accordingly to avoid any last minute rushes to get key functionality in. General Information ------------------- Next week is the Extra-ATC freeze, in preparation for future elections. All contributions to OpenStack are valuable, but some are not expressed as Gerrit code changes. Please list active contributors to your project team who do not have a code contribution this cycle, and therefore won't automatically be considered an Active Technical Contributor and allowed to vote. This is done by adding extra-atcs to https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml before the Extra-ATC freeze on August 28. A quick reminder of the upcoming freeze dates. Those vary depending on deliverable type: * General libraries (except client libraries) need to have their last feature release before Non-client library freeze (Sept 3). Their stable branches are cut early. * Client libraries (think python-*client libraries) need to have their last feature release before Client library freeze (Sept 10) * Deliverables following a cycle-with-rc model (that would be most services) observe a Feature freeze on that same date, Sept 10. Any feature addition beyond that date should be discussed on the mailing-list and get PTL approval. After feature freeze, cycle-with-rc deliverables need to produce a first release candidate (and a stable branch) before RC1 deadline (Sept 24) * Deliverables following cycle-with-intermediary model can release as necessary, but in all cases before Final RC deadline (Oct 8) Finally, now is also a good time to start planning what highlights you want for your deliverables in the cycle highlights. The deadline to submit an initial version for those is set to Feature freeze (Sept 10). Background on cycle-highlights: http://lists.openstack.org/pipermail/openstack-dev/2017-December/125613.html Project Team Guide, Cycle-Highlights: https://docs.openstack.org/project-team-guide/release-management.html#cycle-highlights knelson [at] openstack.org/diablo_rojo on IRC is available if you need help selecting or writing your highlights Upcoming Deadlines & Dates -------------------------- Non-client library freeze: September 3 (R-6 week) Client library freeze: September 10 (R-5 week) Victoria-3 milestone: September 10 (R-5 week) Victoria release: October 14 From skaplons at redhat.com Thu Aug 20 20:24:20 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 20 Aug 2020 22:24:20 +0200 Subject: [Neutron] Drivers meeting agenda Message-ID: <20200820202420.flkrajceygsy7y37@skaplons-mac> Hi, Here is agenda for tomorrow's drivers meeting. We have 2 RFEs to discuss: * https://bugs.launchpad.net/neutron/+bug/1891334 - [RFE] Enable change of CIDR on a subnet * https://bugs.launchpad.net/neutron/+bug/1892200 - Make keepalived healthcheck more configurable There are also some points from Rodolfo on the on demand agenda but Rodolfo is on PTO this week so probably we will discuss those topics finally next week. See You on the meeting tomorrow and have a nice day :) -- Slawek Kaplonski Principal software engineer Red Hat From pierre at stackhpc.com Thu Aug 20 20:48:26 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 20 Aug 2020 22:48:26 +0200 Subject: [cloudkitty] Resuming CloudKitty IRC meetings In-Reply-To: <20200820192231.2cv3x3qqm5oaanaz@yuggoth.org> References: <4a4ca182-62fc-a728-65bb-1001510f5af8@gmx.com> <20200820182438.lrqp5baym5o37nhl@yuggoth.org> <20200820192231.2cv3x3qqm5oaanaz@yuggoth.org> Message-ID: Thanks for clearing up the confusion, I had not considered this alternative interpretation of even and odds weeks. And thanks for sharing the tidbit about 53-week years. Should I mention there are months with a fifth Monday as well? ;-) So, I would like to clarify that I am proposing meetings on odd ISO week numbers. This is an initial schedule, which we can adapt based on project activity. On Thu, 20 Aug 2020 at 21:34, Jeremy Stanley wrote: > > On 2020-08-20 13:37:53 -0500 (-0500), Sean McGinnis wrote: > > > > > "biweekly-odd Occurs on odd weeks (ISOweek % 2 == 1)" > > > > > > "Odd/Even and week numbers are based on the ISO week number. ISO > > > weeks can be checked with %V in GNU date(1)" > > > > > > I think some people have assumed it's even/odd week numbers in a > > > month rather than even/odd week numbers counting from the epoch. > > > Technically we have frequencies like "first-tuesday" and > > > "third-tuesday" to address specific week numbers in a month rather > > > than truly alternating weeks. > > Based on the existing description, that appears to be the intent. So it > > had been the first and third weeks of the month, so therefore the odd > > weeks. But then we at least have a mismatch between phrasing used and > > what yaml2ical generates, so the description does need to be updated. > > And even I'm easily confused by this, as evidenced by the fact that > above I confused epoch weeks with ISO annual week counts. As the > README.rst suggests, if you have a "53-week" year then you get ISO > odd weeks back to back between the end of that year and the start > of the next, or a pair of ISO even weeks which are three weeks apart > instead of two. > -- > Jeremy Stanley From openstack at nemebean.com Thu Aug 20 21:16:44 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 20 Aug 2020 16:16:44 -0500 Subject: [requirements][oslo] Inclusion of CONFspirator in openstack/requirements In-Reply-To: References: Message-ID: On 8/16/20 11:42 PM, Adrian Turjak wrote: > Hey OpenStackers! > > I'm hoping to add CONFspirator to openstack/requirements as I'm using it > Adjutant: > https://review.opendev.org/#/c/746436/ > > The library has been in Adjutant for a while but I didn't add it to > openstack/requirements, so I'm trying to remedy that now. I think it is > different enough from oslo.config and I think the features/differences > are ones that are unlikely to ever make sense in oslo.config without > breaking it for people who do use it as it is, or adding too much > complexity. > > I wanted to use oslo.config but quickly found that the way I was > currently doing config in Adjutant was heavily dependent on yaml, and > the ability to nest things. I was in a bind because I didn't have a > declarative config system like oslo.config, and the config for Adjutant > was a mess to maintain and understand (even for me, and I wrote it) with > random parts of the code pulling config that may or may not have been > set/declared. > > After finding oslo.config was not suitable for my rather weird needs, I > took oslo.config as a starting point and ended up writing another > library specific to my requirements in Adjutant, and rather than keeping > it internal to Adjutant, moved it to an external library. > > CONFspirator was built for a weird and complex edge case, because I have > plugins that need to dynamically load config on startup, which then has > to be lazy_loaded. I also have weird overlay logic for defaults that can > be overridden, and building it into the library made Adjutant simpler. I > also have nested config groups that need to be named dynamically to > allow plugin classes to be extended without subclasses sharing the same > config group name. I built something specific to my needs, that just so > happens to also be a potentially useful library for people wanting > something like oslo.config but that is targeted towards yaml and toml, > and the ability to nest groups. > > The docs are here: https://confspirator.readthedocs.io/ > The code is here: https://gitlab.com/catalyst-cloud/confspirator > > And for those interested in how I use it in Adjutant here are some > places of interest (be warned, it may be a rabbit hole): > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/config > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/feature_set.py > > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/core.py > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/api/v1/openstack.py#L35-L44 > > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/actions/v1/projects.py#L155-L164 > > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/actions/v1/base.py#L146 > > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/tasks/v1/base.py#L30 > > https://opendev.org/openstack/adjutant/src/branch/master/adjutant/tasks/v1/base.py#L293 > > > If there are strong opinions about working to add this to oslo.config, > let's chat, as I'm not against merging this into it somehow if we find a > way that make sense, but while some aspects where similar, I felt that > this was cleaner without being part of oslo.config because the mindset I > was building towards seemed different and oslo.config didn't need my > complexity. Okay, I'll take a crack at discussing this from the Oslo side. First, we've tried to avoid adding YAML support to oslo.config for a couple of reasons: 1) consistency of configs across services. We didn't want to end up with a mix of ini and yaml files. 2) As you discovered, the oslo.config model isn't conducive to nested YAML layouts, so most of the benefits are lost anyway. There are exceptions, of course. Just within oslo, oslo.policy uses YAML configs, but it gives up most of the oslo.config niceties to do so. Policy had to reimplement things like deprecation handling because it's dealing with raw YAML instead of a config object. I believe there are other examples where services had to refer to a YAML file for their complex config opts. With all that said, I'm pretty sure a motivated person could write a YAML driver for oslo.config. It would introduce a layer of indirection - the service would refer to a .conf file containing just the driver config, which would then point to a separate .yaml file. I'm not sure you could implement nesting this way, but I haven't dug into the code to find out for sure. In general, given the complexity of what you're talking about I think a driver plugin would be the way to go, as opposed to trying to fit this all in with the core oslo.config functionality (assuming you/we decide to pursue integrating at all). There were a few other things you mentioned as features of the library. The following are some off-the-cuff thoughts without having looked too closely at the library, so if they're nonsense that's my excuse. ;-) "because I have plugins that need to dynamically load config on startup, which then has to be lazy_loaded" Something like this could probably be done. I believe this is kind of how the castellan driver in oslo.config works. Config values are looked up and cached on-demand, as opposed to all at once. "I also have weird overlay logic for defaults that can be overridden" My knee-jerk reaction to this is that oslo.config already supports overriding defaults, so I assume there's something about your use case that didn't mesh with that functionality? Or is this part of oslo.config that you reused? "I also have nested config groups that need to be named dynamically to allow plugin classes to be extended without subclasses sharing the same config group name." Minus the nesting part, this is also something being done with oslo.config today. The config driver feature actually uses dynamically named groups, and I believe at least Cinder is using them too. They do cause a bit of grief for some config tools, like the validator, but it is possible to do something like this. Now, as to the question of whether we should try to integrate this library with oslo.config: I don't know. Helpful, right? ;-) I think answering that question definitively would take a deeper dive into what the new library is doing differently than I can commit to. As I noted above, I don't think the things you're doing are so far out in left field that it would be unreasonable to consider integrating into oslo.config, but the devil is in the details and there are a lot of details here that I don't know anything about. For example, will the oslo.config data model even accommodate nested groups? I suspect it doesn't now, but could it? Probably, but I can't say how difficult/disruptive it would be. If someone wanted to make incremental progress toward integration, I think writing a basic YAML driver for oslo.config would be a good start. It would be generally useful even if this work doesn't go anywhere, and it would provide a basis for a hypothetical future YAML-based driver providing CONFspirator functionality. From there we could start looking to integrate more advanced features one at a time. Apologies for the wall of text. I hope you got something out of this before your eyes glazed over. :-) -Ben From openstack at nemebean.com Thu Aug 20 21:41:07 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 20 Aug 2020 16:41:07 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <20200820153503.GY31915@sync> References: <2af09e63936f75489946ea6b70c41d6e091531ee.camel@redhat.com> <7496bd35-856e-f48f-b6d8-65155b1777f1@openstack.org> <16a3adf0-2f51-dd7d-c729-7b27f1593980@nemebean.com> <6e68d1a3cfc4efff91d3668bb53805dc469673c6.camel@redhat.com> <65204b738f13fcea16b9b6d5a68149c89be73e6a.camel@redhat.com> <20200820153503.GY31915@sync> Message-ID: Thanks for your patience with this! In the last Oslo meeting we had discussed possibly adding some sort of ping client to oslo.messaging to provide a common interface to use this. That would mitigate some of the concerns about everyone having to write their own ping test and potentially sending incorrect messages on the rabbit bus. Obviously that would be done as a followup to this, but I thought I'd mention it in case anyone wants to take a crack at writing something up. On 8/20/20 10:35 AM, Arnaud Morin wrote: > Hey all, > > TLDR: > - Patch in [1] updated > - Example of usage in [3] > - Agree with fixing nova/rabbit/oslo but would like to keep this ping > endpoint also > - Totally agree with documentation needed > > Long: > > Thank you all for your review and for the great information you bring to > that topic! > > First thing, we are not yet using that patch in production, but in > testing/dev only for now (at OVH). > But the plan is to use it in production ASAP. > > Also, we initially pushed that for neutron agent, that's why I missed > the fact that nova already used the "ping" endpoint, sorry for that. > > Anyway, I dont care about the naming, so in latest patchset of [1], you > will see that I changed the name of the endpoint following Ken Giusti > suggestions. > > The bug reported in [2] looks very similar to what we saw. > Thank you Sean for bringing that to attention in this thread. > > To detect this error, using the above "ping" endpoint in oslo, we can > use a script like the one in [3] (sorry about it, I can write better > python :p). > As mentionned by Sean in a previous mail, I am calling effectively > the topic "compute.host123456.sbg5.cloud.ovh.net" in "nova" exchange. > My initial plan would be to identify topics related to a compute and do > pings in all topics, to make sure that all of them are answering. > I am not yet sure about how often and if this is a good plan btw. > > Anyway, the compute is reporting status as UP, but the ping is > timeouting, which is exactly what I wanted to detect! > > I mostly agree with all your comments about the fact that this is a > trick that we do as operator, and using the RPC bus is maybe not the > best approach, but this is pragmatic and quite simple IMHO. > What I also like in this solution is the fact that this is partialy > outside of OpenStack: the endpoint is inside, but doing the ping is > external. > Monitoring OpenStack is not always easy, and sometimes we struggle on > finding the root cause of some issues. Having such endpoint > allow us to monitor OpenStack from an external point of view, but still > in a deeper way. > It's like a probe in your car telling you that even if you are still > running, your engine is off :) > > Still, making sure that this bug is fixed by doing some work on > (rabbit|oslo.messaging|nova|whatever} is the best thing to do. > > However, IMO, this does not prevent this rpc ping endpoint from > existing. > > Last, but not least, I totally agree about documenting this, but also > adding some documentation on how to configure rabbit and OpenStack > services in a way that fit operator needs. > There are plenty of parameters which could be tweaked on both OpenStack > and rabbit side. IMO, we need to explain a little bit more what are the > impact of setting a specific parameter to a given value. > For example, in another discussion ([4]), we were talking about > "durable" queues in rabbit. We manage to find that if we enable HA, we > should also enable durability of queues. > > Anyway that's another topic, and this is also something we discuss in > large-scale group. > > Thank you all, > > [1] https://review.opendev.org/#/c/735385/ > [2] https://bugs.launchpad.net/nova/+bug/1854992 > [3] http://paste.openstack.org/show/796990/ > [4] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016362.html > > From its-openstack at zohocorp.com Fri Aug 21 04:23:29 2020 From: its-openstack at zohocorp.com (its-openstack at zohocorp.com) Date: Fri, 21 Aug 2020 09:53:29 +0530 Subject: per user quota not working properly Message-ID: <1740f41dfb5.b8327d7677873.277850771393902971@zohocorp.com> Dear openstack, We are facing a peculiar issue with regards to users quota of resources. e.g: s.no project user instance quota no instance created 1 test - 10 2 test user1 2 2 3 test user2 2 error "quota over" 4 test user3 3 able to create only 1 instance 5 test user4 no user quota defined able to create 10 instance As you see from mentioned table. when user1,user2, has instance quota of 2 and when user1 has created 2 instance, user2 unable to create instance. but user3 able to create only 1 more instance, user 4 has no quota applied so project quota 10 will be applied and he can create 10 instance. the quota is applied to each user but not tracked for each user, so this defeats the purpose of per user quota. Please help us with resolving this issue.     Regards, sysadmin team -------------- next part -------------- An HTML attachment was scrubbed... URL: From adriant at catalystcloud.nz Fri Aug 21 04:59:24 2020 From: adriant at catalystcloud.nz (Adrian Turjak) Date: Fri, 21 Aug 2020 16:59:24 +1200 Subject: [requirements][oslo] Inclusion of CONFspirator in openstack/requirements In-Reply-To: <431d53d1-d92c-913b-f0a5-2be33b0c4e7a@catalyst.net.nz> References: <431d53d1-d92c-913b-f0a5-2be33b0c4e7a@catalyst.net.nz> Message-ID: <8c83ddc7-3f53-3dc5-8116-436bf6f03064@catalystcloud.nz> On 21/08/20 9:16 am, Ben Nemec wrote: > > In general, given the complexity of what you're talking about I think > a driver plugin would be the way to go, as opposed to trying to fit > this all in with the core oslo.config functionality (assuming you/we > decide to pursue integrating at all). > > There were a few other things you mentioned as features of the > library. The following are some off-the-cuff thoughts without having > looked too closely at the library, so if they're nonsense that's my > excuse. > > "because I have plugins that need to dynamically load config on > startup, which then has to be lazy_loaded" > > Something like this could probably be done. I believe this is kind of > how the castellan driver in oslo.config works. Config values are > looked up and cached on-demand, as opposed to all at once. This one is a little weird, but essentially the way this works this in Adjutant: I load the config so I can start the app and go through base logic and loading plugins, but the config groups that are pulled from plugins aren't added to my config group tree until AFTER the config has already been loaded. So part of the config is usable, but some hasn't yet fully been loaded until after the plugins are done, and then that subtree will lazy_load itself when first accessed. It means that until a given lazy_loaded group is actually accessed as config, the config tree underneath it can still have config options added. It's likely not too crazy to do this in oslo, and have groups only read from the cached source (loaded file dict) when first accessed. > > "I also have weird overlay logic for defaults that can be overridden" > > My knee-jerk reaction to this is that oslo.config already supports > overriding defaults, so I assume there's something about your use case > that didn't mesh with that functionality? Or is this part of > oslo.config that you reused? Sooo, this one is a little special because what this feature lets you do is take any group in the config tree once loaded, and call the overlay function on it with either a dict, or another group. The returned value will be a deep copy of the config tree with the values present in the given dict/group overlaid on that original group. As in a depth first dict update, where only keys that exist on the overriding dict will be updated in the copy of the original dict. I need to write the docs for this... with a sane example, but here is my unit test for it: https://gitlab.com/catalyst-cloud/confspirator/-/blob/master/confspirator/tests/test_configs.py#L211 I use this in Adjutant by having some config groups which are a default for something, and another place where many things can override that default via another group in the config, so I create an overlay copy and pass that to the place that needs the config it it's most specific case when the code actually pulls values from the conf group I passed it. See: https://opendev.org/openstack/adjutant/src/branch/master/adjutant/actions/v1/base.py#L146-L167 I hope that helps, because it is in my mind a little odd to explain, but it allows some useful things when reusing actions in different tasks. > "I also have nested config groups that need to be named dynamically to > allow plugin classes to be extended without subclasses sharing the > same config group name." > > Minus the nesting part, this is also something being done with > oslo.config today. The config driver feature actually uses dynamically > named groups, and I believe at least Cinder is using them too. They do > cause a bit of grief for some config tools, like the validator, but it > is possible to do something like this. Cool! yeah I implemented that feature in my code because I ran into a case of confliciting namespaces and needed to find a better way to handle subclasses that needed to have different dynamic names for their config groups. > > Now, as to the question of whether we should try to integrate this > library with oslo.config: I don't know. Helpful, right? That's mostly where we got to last time I asked in #openstack-oslo! > > I think answering that question definitively would take a deeper dive > into what the new library is doing differently than I can commit to. > As I noted above, I don't think the things you're doing are so far out > in left field that it would be unreasonable to consider integrating > into oslo.config, but the devil is in the details and there are a lot > of details here that I don't know anything about. For example, will > the oslo.config data model even accommodate nested groups? I suspect > it doesn't now, but could it? Probably, but I can't say how > difficult/disruptive it would be. I looked into it briefly, and to do what I wanted, while also maintaining oslo.config how it was... ended up a bit messy, so I gave up because it would take too long and the politics of trying to get it merged/reviewed wouldn't be worth the effort most likely. > > If someone wanted to make incremental progress toward integration, I > think writing a basic YAML driver for oslo.config would be a good > start. It would be generally useful even if this work doesn't go > anywhere, and it would provide a basis for a hypothetical future > YAML-based driver providing CONFspirator functionality. From there we > could start looking to integrate more advanced features one at a time. > > Apologies for the wall of text. I hope you got something out of this > before your eyes glazed over. > > -Ben > Thanks for the wall of text! It was useful! I think ultimately it may be safer just maintaining my own separate library. For people who don't want to use .ini and prefer yaml/toml it's simpler, and for people who prefer .ini and don't need nesting etc, it's safer to keep oslo.config as it is. If someone does find anything I've done that makes sense in oslo.config I'd be happy to work porting it over, but I don't want to make structural changes to it. I'll always keep an eye on oslo.config, and may occasionally steal the odd idea if you add something cool, but I think other than my use of types.py and some of the Opt classes as a base, my code has diverged quite a bit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Fri Aug 21 07:06:24 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 09:06:24 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <20200818120708.GV31915@sync> Message-ID: Hi, don't understand what you mean with "alternate exchange"? I'm doing all my tests on my DEV-Env? It's a completely separated / dedicated (virtual) cluster. I just enabled the feature and wrote a small script to read the metrics from the api. I'm having some "dropped msg" in my cluster, just trying to figure out if they are "normal". Fabian Am Do., 20. Aug. 2020 um 21:28 Uhr schrieb Arnaud MORIN : > > Hello, > Are you doing that using alternate exchange ? > I started configuring it in our env but not yet finished. > > Cheers, > > Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann a écrit : >> >> Hi, >> >> just another idea: >> >> Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages). >> >> Anyone already doing this? >> >> I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea. >> >> Seems we have to wait issue to happen again - what - hopefully - never happens :) >> >> Fabian >> >> Arnaud Morin schrieb am Di., 18. Aug. 2020, 14:07: >>> >>> Hey all, >>> >>> About the vexxhost strategy to use only one rabbit server and manage HA through >>> rabbit. >>> Do you plan to do the same for MariaDB/MySQL? >>> >>> -- >>> Arnaud Morin >>> >>> On 14.08.20 - 18:45, Fabian Zimmermann wrote: >>> > Hi, >>> > >>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running >>> > one rabbitmq Container per Service. Just the kubernetes self healing is >>> > used as "ha" for rabbitmq. >>> > >>> > That seems to match with my finding: run rabbitmq standalone and use an >>> > external system to restart rabbitmq if required. >>> > >>> > Fabian >>> > >>> > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: >>> > >>> > > Fabian, >>> > > >>> > > what do you mean? >>> > > >>> > > >> I think vexxhost is running (1) with their openstack-operator - for >>> > > reasons. >>> > > >>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann >>> > > wrote: >>> > > > >>> > > > Hello again, >>> > > > >>> > > > just a short update about the results of my tests. >>> > > > >>> > > > I currently see 2 ways of running openstack+rabbitmq >>> > > > >>> > > > 1. without durable-queues and without replication - just one >>> > > rabbitmq-process which gets (somehow) restarted if it fails. >>> > > > 2. durable-queues and replication >>> > > > >>> > > > Any other combination of these settings leads to more or less issues with >>> > > > >>> > > > * broken / non working bindings >>> > > > * broken queues >>> > > > >>> > > > I think vexxhost is running (1) with their openstack-operator - for >>> > > reasons. >>> > > > >>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with >>> > > replication but without durable-queues. >>> > > > >>> > > > May someone point me to the best way to document these findings to some >>> > > official doc? >>> > > > I think a lot of installations out there will run into issues if - under >>> > > load - a node fails. >>> > > > >>> > > > Fabian >>> > > > >>> > > > >>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < >>> > > dev.faz at gmail.com>: >>> > > >> >>> > > >> Hi, >>> > > >> >>> > > >> just did some short tests today in our test-environment (without >>> > > durable queues and without replication): >>> > > >> >>> > > >> * started a rally task to generate some load >>> > > >> * kill-9-ed rabbitmq on one node >>> > > >> * rally task immediately stopped and the cloud (mostly) stopped working >>> > > >> >>> > > >> after some debugging i found (again) exchanges which had bindings to >>> > > queues, but these bindings didnt forward any msgs. >>> > > >> Wrote a small script to detect these broken bindings and will now check >>> > > if this is "reproducible" >>> > > >> >>> > > >> then I will try "durable queues" and "durable queues with replication" >>> > > to see if this helps. Even if I would expect >>> > > >> rabbitmq should be able to handle this without these "hidden broken >>> > > bindings" >>> > > >> >>> > > >> This just FYI. >>> > > >> >>> > > >> Fabian >>> > > From skaplons at redhat.com Fri Aug 21 07:21:55 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 21 Aug 2020 09:21:55 +0200 Subject: [neutron] Wallaby PTG planning Message-ID: <20200821072155.pfdjttikum5r54hz@skaplons-mac> Hi, It's again that time of the cycle (time flies) when we need to start thinking about next cycle already. As You probably know, next virtual PTG will be in October 26-30. I need to book some space for the Neuton team before 11th of September so I prepared doodle [1] with possible time slots. Please reply there what are the best days and hours for You so we can try to schedule our sessions in the time slots which fits best most of us :) Please fill this doodle before 4.09 so I will have time to summarize it and book some slots for us. I also prepared etherpad [2]. Please add Your name if You are going to attend the PTG sessions. Please also add proposals of the topics which You want to discuss during the PTG. [1] https://doodle.com/poll/2ppmnua2nuva5nyp [2] https://etherpad.opendev.org/p/neutron-wallaby-ptg -- Slawek Kaplonski Principal software engineer Red Hat From emiller at genesishosting.com Fri Aug 21 07:43:04 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Fri, 21 Aug 2020 02:43:04 -0500 Subject: Using os_token Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814560@gmsxchsvr01.thecreation.com> Hi, It looks like the OS_TOKEN environment variable can be set so a token can be re-used instead of a new authentication for each CLI command with the OpenStack Client. I'm a little confused as to how this works and haven't found any good documentation on the subject. I would have expected to be able to: 0) set appropriate OS_* variables for password authentication 1) create a token using "openstack token issue" 2) unset all OS_* environment variables 3) set OS_TOKEN to the token's value provided in #1 4) set OS_AUTH_TYPE to "v3token" 5) set OS_AUTH_URL to the respective KeyStone endpoint 6) set OS_IDENTITY_API_VERSION to "3" 7) use the CLI as normal However, I get a "The service catalog is empty." message. Maybe I'm missing something above or am I completely misunderstanding the purpose of the OS_TOKEN variable? >From examples I have seen, it looks like the token can be used in a REST API call. Is there a way to use an existing token with the CLI, instead, so a new token is not issued for every CLI command instantiation? Thanks! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From kotobi at dkrz.de Fri Aug 21 07:54:49 2020 From: kotobi at dkrz.de (Amjad Kotobi) Date: Fri, 21 Aug 2020 09:54:49 +0200 Subject: [glance][horizon][ops] Dashboard show Forbidden 403 but openstack cli works fine Message-ID: <04C431DA-85DE-40FB-92CE-3A7273E98E11@dkrz.de> Hi, We are running Train release, currently facing below error when change to “Images” panel on the project which I have “User” role. The image is visibility is public. Error: Forbidden. Insufficient permissions of the requested operation Error: Unable to retrieve the project. By using openstack-cli everything works and I do not face “Forbidden” 403, but in dashboard “access.log” it shows "GET /dashboard/api/keystone/projects/7331defd55ef479fbf0a9a1ac3fe9055 HTTP/1.1” 403 http://xxxxxx/dashboard/project/images” I checked: 1. glance: policy.json from both dashboard and api/registry hosts are same. 2. From dashboard API images is 2 As soon as I change the OWNER of image to the project which my role = USER the error disappears. Any ideas or previous encounter similar to this issue? Thanks Amjad -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5223 bytes Desc: not available URL: From arnaud.morin at gmail.com Fri Aug 21 08:13:32 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Fri, 21 Aug 2020 08:13:32 +0000 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200818120708.GV31915@sync> Message-ID: <20200821081332.GZ31915@sync> Hey, I am talking about that: https://www.rabbitmq.com/ae.html Cheers, -- Arnaud Morin On 21.08.20 - 09:06, Fabian Zimmermann wrote: > Hi, > > don't understand what you mean with "alternate exchange"? I'm doing > all my tests on my DEV-Env? It's a completely separated / dedicated > (virtual) cluster. > > I just enabled the feature and wrote a small script to read the > metrics from the api. > > I'm having some "dropped msg" in my cluster, just trying to figure out > if they are "normal". > > Fabian > > Am Do., 20. Aug. 2020 um 21:28 Uhr schrieb Arnaud MORIN > : > > > > Hello, > > Are you doing that using alternate exchange ? > > I started configuring it in our env but not yet finished. > > > > Cheers, > > > > Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann a écrit : > >> > >> Hi, > >> > >> just another idea: > >> > >> Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages). > >> > >> Anyone already doing this? > >> > >> I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea. > >> > >> Seems we have to wait issue to happen again - what - hopefully - never happens :) > >> > >> Fabian > >> > >> Arnaud Morin schrieb am Di., 18. Aug. 2020, 14:07: > >>> > >>> Hey all, > >>> > >>> About the vexxhost strategy to use only one rabbit server and manage HA through > >>> rabbit. > >>> Do you plan to do the same for MariaDB/MySQL? > >>> > >>> -- > >>> Arnaud Morin > >>> > >>> On 14.08.20 - 18:45, Fabian Zimmermann wrote: > >>> > Hi, > >>> > > >>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running > >>> > one rabbitmq Container per Service. Just the kubernetes self healing is > >>> > used as "ha" for rabbitmq. > >>> > > >>> > That seems to match with my finding: run rabbitmq standalone and use an > >>> > external system to restart rabbitmq if required. > >>> > > >>> > Fabian > >>> > > >>> > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > >>> > > >>> > > Fabian, > >>> > > > >>> > > what do you mean? > >>> > > > >>> > > >> I think vexxhost is running (1) with their openstack-operator - for > >>> > > reasons. > >>> > > > >>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > >>> > > wrote: > >>> > > > > >>> > > > Hello again, > >>> > > > > >>> > > > just a short update about the results of my tests. > >>> > > > > >>> > > > I currently see 2 ways of running openstack+rabbitmq > >>> > > > > >>> > > > 1. without durable-queues and without replication - just one > >>> > > rabbitmq-process which gets (somehow) restarted if it fails. > >>> > > > 2. durable-queues and replication > >>> > > > > >>> > > > Any other combination of these settings leads to more or less issues with > >>> > > > > >>> > > > * broken / non working bindings > >>> > > > * broken queues > >>> > > > > >>> > > > I think vexxhost is running (1) with their openstack-operator - for > >>> > > reasons. > >>> > > > > >>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with > >>> > > replication but without durable-queues. > >>> > > > > >>> > > > May someone point me to the best way to document these findings to some > >>> > > official doc? > >>> > > > I think a lot of installations out there will run into issues if - under > >>> > > load - a node fails. > >>> > > > > >>> > > > Fabian > >>> > > > > >>> > > > > >>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > >>> > > dev.faz at gmail.com>: > >>> > > >> > >>> > > >> Hi, > >>> > > >> > >>> > > >> just did some short tests today in our test-environment (without > >>> > > durable queues and without replication): > >>> > > >> > >>> > > >> * started a rally task to generate some load > >>> > > >> * kill-9-ed rabbitmq on one node > >>> > > >> * rally task immediately stopped and the cloud (mostly) stopped working > >>> > > >> > >>> > > >> after some debugging i found (again) exchanges which had bindings to > >>> > > queues, but these bindings didnt forward any msgs. > >>> > > >> Wrote a small script to detect these broken bindings and will now check > >>> > > if this is "reproducible" > >>> > > >> > >>> > > >> then I will try "durable queues" and "durable queues with replication" > >>> > > to see if this helps. Even if I would expect > >>> > > >> rabbitmq should be able to handle this without these "hidden broken > >>> > > bindings" > >>> > > >> > >>> > > >> This just FYI. > >>> > > >> > >>> > > >> Fabian > >>> > > From dev.faz at gmail.com Fri Aug 21 08:28:32 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 10:28:32 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <20200821081332.GZ31915@sync> References: <20200818120708.GV31915@sync> <20200821081332.GZ31915@sync> Message-ID: Hi, yeah, that's what I'm currently using. I also tried to use the unroutable-counters, but these are only available for channels, which may not have any bindings, so there is no way to find the "root cause" I created an AE "unroutable" and wrote a script to show me the msgs placed here.. currently I get -- 20 Exchange: q-agent-notifier-network-delete_fanout, RoutingKey: 226 Exchange: q-agent-notifier-port-delete_fanout, RoutingKey: 88 Exchange: q-agent-notifier-port-update_fanout, RoutingKey: 388 Exchange: q-agent-notifier-security_group-update_fanout, RoutingKey: -- I think I will start another thread to debug the reason for this, because it has nothing to do with "broken bindings". Fabian Am Fr., 21. Aug. 2020 um 10:13 Uhr schrieb Arnaud Morin : > > Hey, > I am talking about that: > https://www.rabbitmq.com/ae.html > > Cheers, > > -- > Arnaud Morin > > On 21.08.20 - 09:06, Fabian Zimmermann wrote: > > Hi, > > > > don't understand what you mean with "alternate exchange"? I'm doing > > all my tests on my DEV-Env? It's a completely separated / dedicated > > (virtual) cluster. > > > > I just enabled the feature and wrote a small script to read the > > metrics from the api. > > > > I'm having some "dropped msg" in my cluster, just trying to figure out > > if they are "normal". > > > > Fabian > > > > Am Do., 20. Aug. 2020 um 21:28 Uhr schrieb Arnaud MORIN > > : > > > > > > Hello, > > > Are you doing that using alternate exchange ? > > > I started configuring it in our env but not yet finished. > > > > > > Cheers, > > > > > > Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann a écrit : > > >> > > >> Hi, > > >> > > >> just another idea: > > >> > > >> Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages). > > >> > > >> Anyone already doing this? > > >> > > >> I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea. > > >> > > >> Seems we have to wait issue to happen again - what - hopefully - never happens :) > > >> > > >> Fabian > > >> > > >> Arnaud Morin schrieb am Di., 18. Aug. 2020, 14:07: > > >>> > > >>> Hey all, > > >>> > > >>> About the vexxhost strategy to use only one rabbit server and manage HA through > > >>> rabbit. > > >>> Do you plan to do the same for MariaDB/MySQL? > > >>> > > >>> -- > > >>> Arnaud Morin > > >>> > > >>> On 14.08.20 - 18:45, Fabian Zimmermann wrote: > > >>> > Hi, > > >>> > > > >>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > >>> > one rabbitmq Container per Service. Just the kubernetes self healing is > > >>> > used as "ha" for rabbitmq. > > >>> > > > >>> > That seems to match with my finding: run rabbitmq standalone and use an > > >>> > external system to restart rabbitmq if required. > > >>> > > > >>> > Fabian > > >>> > > > >>> > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > >>> > > > >>> > > Fabian, > > >>> > > > > >>> > > what do you mean? > > >>> > > > > >>> > > >> I think vexxhost is running (1) with their openstack-operator - for > > >>> > > reasons. > > >>> > > > > >>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > >>> > > wrote: > > >>> > > > > > >>> > > > Hello again, > > >>> > > > > > >>> > > > just a short update about the results of my tests. > > >>> > > > > > >>> > > > I currently see 2 ways of running openstack+rabbitmq > > >>> > > > > > >>> > > > 1. without durable-queues and without replication - just one > > >>> > > rabbitmq-process which gets (somehow) restarted if it fails. > > >>> > > > 2. durable-queues and replication > > >>> > > > > > >>> > > > Any other combination of these settings leads to more or less issues with > > >>> > > > > > >>> > > > * broken / non working bindings > > >>> > > > * broken queues > > >>> > > > > > >>> > > > I think vexxhost is running (1) with their openstack-operator - for > > >>> > > reasons. > > >>> > > > > > >>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > >>> > > replication but without durable-queues. > > >>> > > > > > >>> > > > May someone point me to the best way to document these findings to some > > >>> > > official doc? > > >>> > > > I think a lot of installations out there will run into issues if - under > > >>> > > load - a node fails. > > >>> > > > > > >>> > > > Fabian > > >>> > > > > > >>> > > > > > >>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > >>> > > dev.faz at gmail.com>: > > >>> > > >> > > >>> > > >> Hi, > > >>> > > >> > > >>> > > >> just did some short tests today in our test-environment (without > > >>> > > durable queues and without replication): > > >>> > > >> > > >>> > > >> * started a rally task to generate some load > > >>> > > >> * kill-9-ed rabbitmq on one node > > >>> > > >> * rally task immediately stopped and the cloud (mostly) stopped working > > >>> > > >> > > >>> > > >> after some debugging i found (again) exchanges which had bindings to > > >>> > > queues, but these bindings didnt forward any msgs. > > >>> > > >> Wrote a small script to detect these broken bindings and will now check > > >>> > > if this is "reproducible" > > >>> > > >> > > >>> > > >> then I will try "durable queues" and "durable queues with replication" > > >>> > > to see if this helps. Even if I would expect > > >>> > > >> rabbitmq should be able to handle this without these "hidden broken > > >>> > > bindings" > > >>> > > >> > > >>> > > >> This just FYI. > > >>> > > >> > > >>> > > >> Fabian > > >>> > > From emiller at genesishosting.com Fri Aug 21 08:29:30 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Fri, 21 Aug 2020 03:29:30 -0500 Subject: Using os_token Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814561@gmsxchsvr01.thecreation.com> I happened to run across an unrelated github issue: https://github.com/terraform-providers/terraform-provider-openstack/issu es/271 which gave me a clue to what I was missing. I needed to include some additional variables (see steps 7 through 9 below). Revised steps - which works fine with the OpenStack Client: 0) set appropriate OS_* variables for password authentication 1) create a token using "openstack token issue" 2) unset all OS_* environment variables 3) set OS_TOKEN to the token's value provided in #1 4) set OS_AUTH_TYPE to "v3token" 5) set OS_AUTH_URL to the respective KeyStone endpoint 6) set OS_IDENTITY_API_VERSION to "3" 7) set OS_PROJECT_DOMAIN_ID as appropriate 8) set OS_PROJECT_NAME as appropriate 9) set OS_REGION_NAME as appropriate 10) use the CLI as normal This shaves anywhere from 0.2 to 0.6 seconds off of a test command I'm running when compared to password authentication (which normally takes about 2.5 seconds to run), where a new token is issued each time. openstack token revoke works as expected too. Eric From dev.faz at gmail.com Fri Aug 21 08:32:09 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 10:32:09 +0200 Subject: [neutron][ops] q-agent-notifier exchanges without bindings. Message-ID: Hi, im currently on the way to analyse some rabbitmq-issues. atm im taking a look on "unroutable messages", so I * created an Alternative Exchange and Queue: "unroutable" * created a policy to send all unroutable msgs to this exchange/queue. * wrote a script to show me the msgs placed here.. currently I get Seems like my neutron is placing msgs in these exchanges, but there is nobody listening/binding to: -- 20 Exchange: q-agent-notifier-network-delete_fanout, RoutingKey: 226 Exchange: q-agent-notifier-port-delete_fanout, RoutingKey: 88 Exchange: q-agent-notifier-port-update_fanout, RoutingKey: 388 Exchange: q-agent-notifier-security_group-update_fanout, RoutingKey: -- Is someone able to give me a hint where to look at / how to debug this? Fabian From moreira.belmiro.email.lists at gmail.com Fri Aug 21 09:26:35 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Fri, 21 Aug 2020 11:26:35 +0200 Subject: [nova][ops] Live migration and CPU features In-Reply-To: <20200819092130.GX31915@sync> References: <44347504ff7308a6c3b4155060c778fad368a002.camel@redhat.com> <20200819092130.GX31915@sync> Message-ID: Hi, thank you all for your comments/suggestions. Having a "custom" cpu_mode seems the best option for our use case. "host-passhtough" is problematic when the hardware is retired and instances need to be moved to newer compute nodes. Belmiro On Wed, Aug 19, 2020 at 11:21 AM Arnaud Morin wrote: > > Hello, > > We have the same kind of issue. > To help mitigate it, we do segregation and also use cpu_mode=custom, so we > can use a model which is close to our hardware (cpu_model=Haswell-noTSX) > and add extra_flags when needed. > > This is painful. > > Cheers, > > -- > Arnaud Morin > > On 18.08.20 - 16:16, Sean Mooney wrote: > > On Tue, 2020-08-18 at 17:06 +0200, Fabian Zimmermann wrote: > > > Hi, > > > > > > We are using the "custom"-way. But this does not protect you from all > issues. > > > > > > We had problems with a new cpu-generation not (jet) detected correctly > > > in an libvirt-version. So libvirt failed back to the "desktop"-cpu of > > > this newer generation, but didnt support/detect some features => > > > blocked live-migration. > > yes that is common when using really new hardware. having previouly > worked > > at intel and hitting this often that one of the reason i tend to default > to host-passthouh > > and recommend using AZ or aggreate to segreatate the cloud for live > migration. > > > > in the case where your libvirt does not know about the new cpus your > best approch is to use the > > newest server cpu model that it know about and then if you really need > the new fature you can try > > to add theem using the config options but that is effectivly the same > as using host-passhtough > > which is why i default to that as a workaround instead. > > > > > > > > Fabian > > > > > > Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira > > > : > > > > > > > > Hi, > > > > in our infrastructure we have always compute nodes that need a > hardware intervention and as a consequence they are > > > > rebooted, bringing a new kernel, kvm, ... > > > > > > > > In order to have a good compromise between performance and > flexibility (live migration) we have been using "host- > > > > model" for the "cpu_mode" configuration of our service VMs. We > didn't expect to have CPU compatibility issues > > > > because we have the same hardware type per cell. > > > > > > > > The problem is that when a compute node is rebooted the instance > domain is recreated with the new cpu features that > > > > were introduced because of the reboot (using centOS). > > > > > > > > If there are new CPU features exposed, this basically blocks live > migration to all the non rebooted compute nodes > > > > (those cpu features are not exposed, yet). The nova-scheduler > doesn't know about them when scheduling the live > > > > migration destination. > > > > > > > > I wonder how other operators are solving this issue. > > > > I don't like stopping OS upgrades. > > > > What I'm considering is to define a "custom" cpu_mode for each > hardware type. > > > > > > > > I would appreciate your comments and learn how you are solving this > problem. > > > > > > > > Belmiro > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Fri Aug 21 11:29:15 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 13:29:15 +0200 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200818120708.GV31915@sync> <20200821081332.GZ31915@sync> Message-ID: Hi, just to keep you updated. It seems these "q-agent-notifier"-exchanges are not used by every possible neutron-driver/agent-backend, so it seems to be fine to have unrouted msgs here. I was (again) able to get some broken bindings in my dev-cluster. The counters for "unrouted msg" are increased, but the msgs sent to these exchanges/bindings/queues are *NOT* placed in the alternate-exchange. It's quite bad, because of the above "normal" unrouted msgs we could not just use the counter as "error-indicator". I think I will try to create a valid "bind" in above exchanges, so these will not increment the "unroutable"-counter and use the counter as monitoring-target. Fabian Am Fr., 21. Aug. 2020 um 10:28 Uhr schrieb Fabian Zimmermann : > > Hi, > > yeah, that's what I'm currently using. > > I also tried to use the unroutable-counters, but these are only > available for channels, which may not have any bindings, so there is > no way to find the "root cause" > > I created an AE "unroutable" and wrote a script to show me the msgs > placed here.. currently I get > > -- > 20 Exchange: q-agent-notifier-network-delete_fanout, RoutingKey: > 226 Exchange: q-agent-notifier-port-delete_fanout, RoutingKey: > 88 Exchange: q-agent-notifier-port-update_fanout, RoutingKey: > 388 Exchange: q-agent-notifier-security_group-update_fanout, RoutingKey: > -- > > I think I will start another thread to debug the reason for this, > because it has nothing to do with "broken bindings". > > Fabian > > Am Fr., 21. Aug. 2020 um 10:13 Uhr schrieb Arnaud Morin > : > > > > Hey, > > I am talking about that: > > https://www.rabbitmq.com/ae.html > > > > Cheers, > > > > -- > > Arnaud Morin > > > > On 21.08.20 - 09:06, Fabian Zimmermann wrote: > > > Hi, > > > > > > don't understand what you mean with "alternate exchange"? I'm doing > > > all my tests on my DEV-Env? It's a completely separated / dedicated > > > (virtual) cluster. > > > > > > I just enabled the feature and wrote a small script to read the > > > metrics from the api. > > > > > > I'm having some "dropped msg" in my cluster, just trying to figure out > > > if they are "normal". > > > > > > Fabian > > > > > > Am Do., 20. Aug. 2020 um 21:28 Uhr schrieb Arnaud MORIN > > > : > > > > > > > > Hello, > > > > Are you doing that using alternate exchange ? > > > > I started configuring it in our env but not yet finished. > > > > > > > > Cheers, > > > > > > > > Le jeu. 20 août 2020 à 19:16, Fabian Zimmermann a écrit : > > > >> > > > >> Hi, > > > >> > > > >> just another idea: > > > >> > > > >> Rabbitmq is able to count undelivered messages. We could use this information to detect the broken bindings (causing undeliverable messages). > > > >> > > > >> Anyone already doing this? > > > >> > > > >> I currently don't have a way to reproduce the broken bindings, so I'm unable to proof the idea. > > > >> > > > >> Seems we have to wait issue to happen again - what - hopefully - never happens :) > > > >> > > > >> Fabian > > > >> > > > >> Arnaud Morin schrieb am Di., 18. Aug. 2020, 14:07: > > > >>> > > > >>> Hey all, > > > >>> > > > >>> About the vexxhost strategy to use only one rabbit server and manage HA through > > > >>> rabbit. > > > >>> Do you plan to do the same for MariaDB/MySQL? > > > >>> > > > >>> -- > > > >>> Arnaud Morin > > > >>> > > > >>> On 14.08.20 - 18:45, Fabian Zimmermann wrote: > > > >>> > Hi, > > > >>> > > > > >>> > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > >>> > one rabbitmq Container per Service. Just the kubernetes self healing is > > > >>> > used as "ha" for rabbitmq. > > > >>> > > > > >>> > That seems to match with my finding: run rabbitmq standalone and use an > > > >>> > external system to restart rabbitmq if required. > > > >>> > > > > >>> > Fabian > > > >>> > > > > >>> > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > >>> > > > > >>> > > Fabian, > > > >>> > > > > > >>> > > what do you mean? > > > >>> > > > > > >>> > > >> I think vexxhost is running (1) with their openstack-operator - for > > > >>> > > reasons. > > > >>> > > > > > >>> > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > >>> > > wrote: > > > >>> > > > > > > >>> > > > Hello again, > > > >>> > > > > > > >>> > > > just a short update about the results of my tests. > > > >>> > > > > > > >>> > > > I currently see 2 ways of running openstack+rabbitmq > > > >>> > > > > > > >>> > > > 1. without durable-queues and without replication - just one > > > >>> > > rabbitmq-process which gets (somehow) restarted if it fails. > > > >>> > > > 2. durable-queues and replication > > > >>> > > > > > > >>> > > > Any other combination of these settings leads to more or less issues with > > > >>> > > > > > > >>> > > > * broken / non working bindings > > > >>> > > > * broken queues > > > >>> > > > > > > >>> > > > I think vexxhost is running (1) with their openstack-operator - for > > > >>> > > reasons. > > > >>> > > > > > > >>> > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > >>> > > replication but without durable-queues. > > > >>> > > > > > > >>> > > > May someone point me to the best way to document these findings to some > > > >>> > > official doc? > > > >>> > > > I think a lot of installations out there will run into issues if - under > > > >>> > > load - a node fails. > > > >>> > > > > > > >>> > > > Fabian > > > >>> > > > > > > >>> > > > > > > >>> > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > >>> > > dev.faz at gmail.com>: > > > >>> > > >> > > > >>> > > >> Hi, > > > >>> > > >> > > > >>> > > >> just did some short tests today in our test-environment (without > > > >>> > > durable queues and without replication): > > > >>> > > >> > > > >>> > > >> * started a rally task to generate some load > > > >>> > > >> * kill-9-ed rabbitmq on one node > > > >>> > > >> * rally task immediately stopped and the cloud (mostly) stopped working > > > >>> > > >> > > > >>> > > >> after some debugging i found (again) exchanges which had bindings to > > > >>> > > queues, but these bindings didnt forward any msgs. > > > >>> > > >> Wrote a small script to detect these broken bindings and will now check > > > >>> > > if this is "reproducible" > > > >>> > > >> > > > >>> > > >> then I will try "durable queues" and "durable queues with replication" > > > >>> > > to see if this helps. Even if I would expect > > > >>> > > >> rabbitmq should be able to handle this without these "hidden broken > > > >>> > > bindings" > > > >>> > > >> > > > >>> > > >> This just FYI. > > > >>> > > >> > > > >>> > > >> Fabian > > > >>> > > From adriant at catalystcloud.nz Fri Aug 21 11:30:24 2020 From: adriant at catalystcloud.nz (Adrian Turjak) Date: Fri, 21 Aug 2020 23:30:24 +1200 Subject: Using os_token In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814561@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814561@gmsxchsvr01.thecreation.com> Message-ID: hah, that's my issue. The problem though is that keystoneauth actually does fetch a new token every time even when you supply it with one, but that new token is based on the one you supply, and is a scoped token. It's likely the api for getting a token from an existing one is faster than password auth. I wish there was a way to  have the tools actually reuse a given scoped token rather than fetch a new one every time... OS_TOKEN is also useful/important because of MFA, which otherwise wouldn't work unless you reuse a token. And I'm hoping that when someone has time to work MFA support properly into the cli tool they can hopefully also think about how to make the token reuse better. On 21/08/20 8:29 pm, Eric K. Miller wrote: > I happened to run across an unrelated github issue: > https://github.com/terraform-providers/terraform-provider-openstack/issu > es/271 > > which gave me a clue to what I was missing. I needed to include some > additional variables (see steps 7 through 9 below). > > Revised steps - which works fine with the OpenStack Client: > 0) set appropriate OS_* variables for password authentication > 1) create a token using "openstack token issue" > 2) unset all OS_* environment variables > 3) set OS_TOKEN to the token's value provided in #1 > 4) set OS_AUTH_TYPE to "v3token" > 5) set OS_AUTH_URL to the respective KeyStone endpoint > 6) set OS_IDENTITY_API_VERSION to "3" > 7) set OS_PROJECT_DOMAIN_ID as appropriate > 8) set OS_PROJECT_NAME as appropriate > 9) set OS_REGION_NAME as appropriate > 10) use the CLI as normal > > This shaves anywhere from 0.2 to 0.6 seconds off of a test command I'm > running when compared to password authentication (which normally takes > about 2.5 seconds to run), where a new token is issued each time. > > openstack token revoke works as expected too. > > Eric > > > From sandeep.ee.nagendra at gmail.com Thu Aug 20 16:32:01 2020 From: sandeep.ee.nagendra at gmail.com (sandeep) Date: Thu, 20 Aug 2020 22:02:01 +0530 Subject: Cliff auto completion not working inside interactive mode Message-ID: Hi Team, In my system, I am trying auto completion for my CLI application. *CLIFF version - cliff==3.4.0* Auto complete works fine on bash prompt. But inside the interactive shell, auto complete does not work. Below is the screenshot for the help command inside the interactive shell. [image: image.png] Now, if I type swm and press tab, it lists all the sub commands under it. But, swm s gives swm "s and further command auto completion does not work. [image: image.png] Could you please let me know what could be the problem? Is this a known issue? or Am i missing something? Thanks, Sandeep -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 28735 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7345 bytes Desc: not available URL: From jasowang at redhat.com Fri Aug 21 03:14:41 2020 From: jasowang at redhat.com (Jason Wang) Date: Fri, 21 Aug 2020 11:14:41 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820142740.6513884d.cohuck@redhat.com> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> <20200819081338.GC21172@joy-OptiPlex-7040> <20200820142740.6513884d.cohuck@redhat.com> Message-ID: On 2020/8/20 下午8:27, Cornelia Huck wrote: > On Wed, 19 Aug 2020 17:28:38 +0800 > Jason Wang wrote: > >> On 2020/8/19 下午4:13, Yan Zhao wrote: >>> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: >>>> On 2020/8/19 下午2:59, Yan Zhao wrote: >>>>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: >>>>>> On 2020/8/19 上午11:30, Yan Zhao wrote: >>>>>>> hi All, >>>>>>> could we decide that sysfs is the interface that every VFIO vendor driver >>>>>>> needs to provide in order to support vfio live migration, otherwise the >>>>>>> userspace management tool would not list the device into the compatible >>>>>>> list? >>>>>>> >>>>>>> if that's true, let's move to the standardizing of the sysfs interface. >>>>>>> (1) content >>>>>>> common part: (must) >>>>>>> - software_version: (in major.minor.bugfix scheme) >>>>>> This can not work for devices whose features can be negotiated/advertised >>>>>> independently. (E.g virtio devices) > I thought the 'software_version' was supposed to describe kind of a > 'protocol version' for the data we transmit? I.e., you add a new field, > you bump the version number. Ok, but since we mandate backward compatibility of uABI, is this really worth to have a version for sysfs? (Searching on sysfs shows no examples like this) > >>>>>> >>>>> sorry, I don't understand here, why virtio devices need to use vfio interface? >>>> I don't see any reason that virtio devices can't be used by VFIO. Do you? >>>> >>>> Actually, virtio devices have been used by VFIO for many years: >>>> >>>> - passthrough a hardware virtio devices to userspace(VM) drivers >>>> - using virtio PMD inside guest >>>> >>> So, what's different for it vs passing through a physical hardware via VFIO? >> >> The difference is in the guest, the device could be either real hardware >> or emulated ones. >> >> >>> even though the features are negotiated dynamically, could you explain >>> why it would cause software_version not work? >> >> Virtio device 1 supports feature A, B, C >> Virtio device 2 supports feature B, C, D >> >> So you can't migrate a guest from device 1 to device 2. And it's >> impossible to model the features with versions. > We're talking about the features offered by the device, right? Would it > be sufficient to mandate that the target device supports the same > features or a superset of the features supported by the source device? Yes. > >> >>> >>>>> I think this thread is discussing about vfio related devices. >>>>> >>>>>>> - device_api: vfio-pci or vfio-ccw ... >>>>>>> - type: mdev type for mdev device or >>>>>>> a signature for physical device which is a counterpart for >>>>>>> mdev type. >>>>>>> >>>>>>> device api specific part: (must) >>>>>>> - pci id: pci id of mdev parent device or pci id of physical pci >>>>>>> device (device_api is vfio-pci)API here. >>>>>> So this assumes a PCI device which is probably not true. >>>>>> >>>>> for device_api of vfio-pci, why it's not true? >>>>> >>>>> for vfio-ccw, it's subchannel_type. >>>> Ok but having two different attributes for the same file is not good idea. >>>> How mgmt know there will be a 3rd type? >>> that's why some attributes need to be common. e.g. >>> device_api: it's common because mgmt need to know it's a pci device or a >>> ccw device. and the api type is already defined vfio.h. >>> (The field is agreed by and actually suggested by Alex in previous mail) >>> type: mdev_type for mdev. if mgmt does not understand it, it would not >>> be able to create one compatible mdev device. >>> software_version: mgmt can compare the major and minor if it understands >>> this fields. >> >> I think it would be helpful if you can describe how mgmt is expected to >> work step by step with the proposed sysfs API. This can help people to >> understand. > My proposal would be: > - check that device_api matches > - check possible device_api specific attributes > - check that type matches [I don't think the combination of mdev types > and another attribute to determine compatibility is a good idea; Any reason for this? Actually if we only use mdev type to detect the compatibility, it would be much more easier. Otherwise, we are actually re-inventing mdev types. E.g can we have the same mdev types with different device_api and other attributes? > actually, the current proposal confuses me every time I look at it] > - check that software_version is compatible, assuming semantic > versioning > - check possible type-specific attributes I'm not sure if this is too complicated. And I suspect there will be vendor specific attributes: - for compatibility check: I think we should either modeling everything via mdev type or making it totally vendor specific. Having something in the middle will bring a lot of burden - for provisioning: it's still not clear. As shown in this proposal, for NVME we may need to set remote_url, but unless there will be a subclass (NVME) in the mdev (which I guess not), we can't prevent vendor from using another attribute name, in this case, tricks like attributes iteration in some sub directory won't work. So even if we had some common API for compatibility check, the provisioning API is still vendor specific ... Thanks > >> Thanks for the patience. Since sysfs is uABI, when accepted, we need >> support it forever. That's why we need to be careful. > Nod. > > (...) From mahdi.abbasi.2013 at gmail.com Fri Aug 21 08:04:41 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Fri, 21 Aug 2020 12:34:41 +0430 Subject: Nova Docker Message-ID: Hi openstack development team, Given that the nova-docker peoject is np longer availble, is there any solution for creating a docker instanace in openstack? Best regards Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbengt at redhat.com Fri Aug 21 09:58:09 2020 From: dbengt at redhat.com (Daniel Bengtsson) Date: Fri, 21 Aug 2020 11:58:09 +0200 Subject: Can't fetch from opendev. In-Reply-To: <20200818152414.s5srmotngy7a7w7r@yuggoth.org> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> <6590e740-00f1-ee60-ac00-5872039e0cb0@redhat.com> <20200818152414.s5srmotngy7a7w7r@yuggoth.org> Message-ID: <2307899b-c1ac-43f0-4fa9-bd61e164979f@redhat.com> On 8/18/20 5:24 PM, Jeremy Stanley wrote: > and it just hangs indefinitely and never returns an error? Yes. > One reason I suspect this might be the problem is that GitHub is > IPv4-only, so if you have something black-holing or blocking traffic > for global IPv6 routes, then that could cause the behavior you're > observing. I have the same problem with the -4 option. From rosmaita.fossdev at gmail.com Fri Aug 21 13:36:50 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 21 Aug 2020 09:36:50 -0400 Subject: [cinder] reviewing priorities for next few weeks Message-ID: <5ac97d24-571c-c25a-a896-d27b5413f98d@gmail.com> Hello Cinderinos, We're near the end of week R-8, and people are getting antsy about having their changes reviewed. So here are the Cinder project reviewing priorities over the next few weeks. (1) os-brick The Victoria release of os-brick must take place during week R-6 (i.e., by 31 August). Hence, os-brick reviews are the project's TOP PRIORITY right now. Among os-brick reviews, these are the most important changes: - support for volume-local-cache feature https://review.opendev.org/663549 - support for cinderlib RBD use https://review.opendev.org/#/q/topic:cinderlib-changes+status:open - code cleanup (should be quick reviews) https://review.opendev.org/#/q/status:open+project:openstack/os-brick+branch:master+topic:major-bump There are some other open reviews in master; if they interest you, go for it. But the above are release-critical. (2) cinder features requiring python-cinderclient support The Victoria release of python-cinderclient must take place during week R-5 (i.e., by 7 September). - project-level default volume-types cinder: https://review.opendev.org/737707 cinderclient: https://review.opendev.org/739223/ - active-active support https://review.opendev.org/#/q/topic:a-a-support+status:open There are other open reviews in master for python-cinderclient that could use some eyes. I didn't see anything major, but it would be good to look in case I missed something. (3) other features (including drivers) The feature freeze is the end of R-5 (i.e., 8 September) - Use the blueprints to find these: https://blueprints.launchpad.net/cinder/victoria And we'll be reviewing everything else also, but items in the 3 categories above get priority. If your patch is in category (3) or not-prioritized, you can always help speed things up by reviewing the higher-priority items. We're almost at the end of the Victoria cycle. Let's have a productive few weeks! brian From fungi at yuggoth.org Fri Aug 21 13:45:54 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 21 Aug 2020 13:45:54 +0000 Subject: Can't fetch from opendev. In-Reply-To: <2307899b-c1ac-43f0-4fa9-bd61e164979f@redhat.com> References: <58c9ecb6-d1cc-df2f-caa8-693ed3f03d00@redhat.com> <20200817143703.c5rh3eqcl3ihxy4m@yuggoth.org> <6590e740-00f1-ee60-ac00-5872039e0cb0@redhat.com> <20200818152414.s5srmotngy7a7w7r@yuggoth.org> <2307899b-c1ac-43f0-4fa9-bd61e164979f@redhat.com> Message-ID: <20200821134553.cpbt2q3l5gw7zbvk@yuggoth.org> On 2020-08-21 11:58:09 +0200 (+0200), Daniel Bengtsson wrote: > On 8/18/20 5:24 PM, Jeremy Stanley wrote: [...] > > and it just hangs indefinitely and never returns an error? > > Yes. > > > One reason I suspect this might be the problem is that GitHub is > > IPv4-only, so if you have something black-holing or blocking traffic > > for global IPv6 routes, then that could cause the behavior you're > > observing. > > I have the same problem with the -4 option. You mentioned earlier that you and your colleague are both using a work VPN. If this is a full-tunnel VPN, or a split-tunnel providing conflicting routes, it's possible something within the work network is eating or not properly rerouting your packets or the return responses. Have you tried a git fetch with the VPN temporarily turned off? Are you able to browse https://opendev.org/ from the same system? Are you running git from directly within your system, or are you running it inside a virtual machine/container on your system? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jean-philippe at evrard.me Fri Aug 21 14:22:23 2020 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Fri, 21 Aug 2020 16:22:23 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: Message-ID: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > Hi, > > if nobody complains I also would like to request core status to help getting the project further. > > Fabian Zimmermann Let's hope this will not be lost in the list :) From jean-philippe at evrard.me Fri Aug 21 14:35:49 2020 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Fri, 21 Aug 2020 16:35:49 +0200 Subject: [releases] Dropping my releases core/release-manager hat Message-ID: Hello folks, I am sad to announce that, while super motivated to keep helping the team, I cannot reliably and consistantly do my duties of core in the releases team, due to my current duties at work. It's been a while I haven't significantly helped the release team, and the team deserve all the transparency and clarity it can get about its contributors. It's time for me to step down. It's been a pleasure to help the team while it lasted. If you are looking for a team to get involved in OpenStack, make no mistake, the release team is awesome. Thank you everyone in the team, you were all amazing and so welcoming :) Regards, Jean-Philippe Evrard (evrardjp) From radoslaw.piliszek at gmail.com Fri Aug 21 14:42:59 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Fri, 21 Aug 2020 16:42:59 +0200 Subject: Nova Docker In-Reply-To: References: Message-ID: Hi, You might be interested in Zun. [1] [1] https://opendev.org/openstack/zun -yoctozepto On Fri, Aug 21, 2020 at 3:36 PM mahdi abbasi wrote: > > Hi openstack development team, > > Given that the nova-docker peoject is np longer availble, is there any solution for creating a docker instanace in openstack? > > Best regards > Mahdi From hberaud at redhat.com Fri Aug 21 15:21:59 2020 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 21 Aug 2020 17:21:59 +0200 Subject: [releases] Dropping my releases core/release-manager hat In-Reply-To: References: Message-ID: Thanks for all the things you have done as a team member! Le ven. 21 août 2020 à 16:39, Jean-Philippe Evrard a écrit : > Hello folks, > > I am sad to announce that, while super motivated to keep helping the team, > I cannot reliably and consistantly do my duties of core in the releases > team, due to my current duties at work. > > It's been a while I haven't significantly helped the release team, and the > team deserve all the transparency and clarity it can get about its > contributors. It's time for me to step down. > > It's been a pleasure to help the team while it lasted. If you are looking > for a team to get involved in OpenStack, make no mistake, the release team > is awesome. Thank you everyone in the team, you were all amazing and so > welcoming :) > > Regards, > Jean-Philippe Evrard (evrardjp) > > > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Fri Aug 21 16:59:19 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 18:59:19 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Hi, As long as there are enough cores to keep the project running everything is fine :) Fabian Jean-Philippe Evrard schrieb am Fr., 21. Aug. 2020, 16:32: > > On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > > Hi, > > > > if nobody complains I also would like to request core status to help > getting the project further. > > > > Fabian Zimmermann > > Let's hope this will not be lost in the list :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cohuck at redhat.com Fri Aug 21 14:52:55 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Fri, 21 Aug 2020 16:52:55 +0200 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> <20200819081338.GC21172@joy-OptiPlex-7040> <20200820142740.6513884d.cohuck@redhat.com> Message-ID: <20200821165255.53e26628.cohuck@redhat.com> On Fri, 21 Aug 2020 11:14:41 +0800 Jason Wang wrote: > On 2020/8/20 下午8:27, Cornelia Huck wrote: > > On Wed, 19 Aug 2020 17:28:38 +0800 > > Jason Wang wrote: > > > >> On 2020/8/19 下午4:13, Yan Zhao wrote: > >>> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: > >>>> On 2020/8/19 下午2:59, Yan Zhao wrote: > >>>>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: > >>>>>> On 2020/8/19 上午11:30, Yan Zhao wrote: > >>>>>>> hi All, > >>>>>>> could we decide that sysfs is the interface that every VFIO vendor driver > >>>>>>> needs to provide in order to support vfio live migration, otherwise the > >>>>>>> userspace management tool would not list the device into the compatible > >>>>>>> list? > >>>>>>> > >>>>>>> if that's true, let's move to the standardizing of the sysfs interface. > >>>>>>> (1) content > >>>>>>> common part: (must) > >>>>>>> - software_version: (in major.minor.bugfix scheme) > >>>>>> This can not work for devices whose features can be negotiated/advertised > >>>>>> independently. (E.g virtio devices) > > I thought the 'software_version' was supposed to describe kind of a > > 'protocol version' for the data we transmit? I.e., you add a new field, > > you bump the version number. > > > Ok, but since we mandate backward compatibility of uABI, is this really > worth to have a version for sysfs? (Searching on sysfs shows no examples > like this) I was not thinking about the sysfs interface, but rather about the data that is sent over while migrating. E.g. we find out that sending some auxiliary data is a good idea and bump to version 1.1.0; version 1.0.0 cannot deal with the extra data, but version 1.1.0 can deal with the older data stream. (...) > >>>>>>> - device_api: vfio-pci or vfio-ccw ... > >>>>>>> - type: mdev type for mdev device or > >>>>>>> a signature for physical device which is a counterpart for > >>>>>>> mdev type. > >>>>>>> > >>>>>>> device api specific part: (must) > >>>>>>> - pci id: pci id of mdev parent device or pci id of physical pci > >>>>>>> device (device_api is vfio-pci)API here. > >>>>>> So this assumes a PCI device which is probably not true. > >>>>>> > >>>>> for device_api of vfio-pci, why it's not true? > >>>>> > >>>>> for vfio-ccw, it's subchannel_type. > >>>> Ok but having two different attributes for the same file is not good idea. > >>>> How mgmt know there will be a 3rd type? > >>> that's why some attributes need to be common. e.g. > >>> device_api: it's common because mgmt need to know it's a pci device or a > >>> ccw device. and the api type is already defined vfio.h. > >>> (The field is agreed by and actually suggested by Alex in previous mail) > >>> type: mdev_type for mdev. if mgmt does not understand it, it would not > >>> be able to create one compatible mdev device. > >>> software_version: mgmt can compare the major and minor if it understands > >>> this fields. > >> > >> I think it would be helpful if you can describe how mgmt is expected to > >> work step by step with the proposed sysfs API. This can help people to > >> understand. > > My proposal would be: > > - check that device_api matches > > - check possible device_api specific attributes > > - check that type matches [I don't think the combination of mdev types > > and another attribute to determine compatibility is a good idea; > > > Any reason for this? Actually if we only use mdev type to detect the > compatibility, it would be much more easier. Otherwise, we are actually > re-inventing mdev types. > > E.g can we have the same mdev types with different device_api and other > attributes? In the end, the mdev type is represented as a string; but I'm not sure we can expect that two types with the same name, but a different device_api are related in any way. If we e.g. compare vfio-pci and vfio-ccw, they are fundamentally different. I was mostly concerned about the aggregation proposal, where type A + aggregation value b might be compatible with type B + aggregation value a. > > > > actually, the current proposal confuses me every time I look at it] > > - check that software_version is compatible, assuming semantic > > versioning > > - check possible type-specific attributes > > > I'm not sure if this is too complicated. And I suspect there will be > vendor specific attributes: > > - for compatibility check: I think we should either modeling everything > via mdev type or making it totally vendor specific. Having something in > the middle will bring a lot of burden FWIW, I'm for a strict match on mdev type, and flexibility in per-type attributes. > - for provisioning: it's still not clear. As shown in this proposal, for > NVME we may need to set remote_url, but unless there will be a subclass > (NVME) in the mdev (which I guess not), we can't prevent vendor from > using another attribute name, in this case, tricks like attributes > iteration in some sub directory won't work. So even if we had some > common API for compatibility check, the provisioning API is still vendor > specific ... Yes, I'm not sure how to deal with the "same thing for different vendors" problem. We can try to make sure that in-kernel drivers play nicely, but not much more. From mahdi.abbasi.2013 at gmail.com Fri Aug 21 17:13:47 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Fri, 21 Aug 2020 21:43:47 +0430 Subject: Nova Docker In-Reply-To: References: Message-ID: Thanks a lot On Fri, 21 Aug 2020, 19:13 Radosław Piliszek, wrote: > Hi, > > You might be interested in Zun. [1] > > [1] https://opendev.org/openstack/zun > > -yoctozepto > > On Fri, Aug 21, 2020 at 3:36 PM mahdi abbasi > wrote: > > > > Hi openstack development team, > > > > Given that the nova-docker peoject is np longer availble, is there any > solution for creating a docker instanace in openstack? > > > > Best regards > > Mahdi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyliu0592 at hotmail.com Fri Aug 21 18:42:11 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Fri, 21 Aug 2020 18:42:11 +0000 Subject: [Kolla Ansible] host maintenance Message-ID: Hi, I wonder if it's supported by Kolla Ansible to deploy a specific host and add it into existing cluster, like replace a control host or compute host? Thanks! Tony From dev.faz at gmail.com Fri Aug 21 20:19:42 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 21 Aug 2020 22:19:42 +0200 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: Message-ID: Hi, seems like someone else is trying to migrate an existing setup to kolla 😉 We currently try it step by step. 1. Use kolla images instead of self developed builder. 2. Generate suitable kolla configuration file layout 3. Hopefully kolla-ansible will hand over But we are still in PoC state. Fabian Tony Liu schrieb am Fr., 21. Aug. 2020, 20:49: > Hi, > > I wonder if it's supported by Kolla Ansible to deploy a specific > host and add it into existing cluster, like replace a control > host or compute host? > > > Thanks! > Tony > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Fri Aug 21 20:45:07 2020 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Fri, 21 Aug 2020 20:45:07 +0000 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: Message-ID: Hello, if you are working on a migration to Kolla, there is a nice guide written by StackHPC that provides one example approach for this complicated maneuver: https://www.stackhpc.com/migrating-to-kolla.html Perhaps not relevant to your specific case, but it can offer some guidance! Cheers, /Jason On Aug 21, 2020, at 3:19 PM, Fabian Zimmermann > wrote: Hi, seems like someone else is trying to migrate an existing setup to kolla 😉 We currently try it step by step. 1. Use kolla images instead of self developed builder. 2. Generate suitable kolla configuration file layout 3. Hopefully kolla-ansible will hand over But we are still in PoC state. Fabian Tony Liu > schrieb am Fr., 21. Aug. 2020, 20:49: Hi, I wonder if it's supported by Kolla Ansible to deploy a specific host and add it into existing cluster, like replace a control host or compute host? Thanks! Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyliu0592 at hotmail.com Fri Aug 21 22:15:25 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Fri, 21 Aug 2020 22:15:25 +0000 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: Message-ID: Actually, in my case, the setup is originally deploy by Kolla Ansible. Other than the initial deployment, I am looking for using Kolla Ansible for maintenance operations. What I am looking for, eg. replace a host, can surely be done by manual steps or customized script. I'd like to know if they are automated by Kolla Ansible. Thanks! Tony > -----Original Message----- > From: Jason Anderson > Sent: Friday, August 21, 2020 1:45 PM > To: Fabian Zimmermann > Cc: Tony Liu ; openstack-discuss discuss at lists.openstack.org> > Subject: Re: [Kolla Ansible] host maintenance > > Hello, if you are working on a migration to Kolla, there is a nice guide > written by StackHPC that provides one example approach for this > complicated maneuver: https://www.stackhpc.com/migrating-to-kolla.html > > Perhaps not relevant to your specific case, but it can offer some > guidance! > > Cheers, > /Jason > > > > On Aug 21, 2020, at 3:19 PM, Fabian Zimmermann > wrote: > > Hi, > > seems like someone else is trying to migrate an existing setup to > kolla 😉 > > We currently try it step by step. > > 1. Use kolla images instead of self developed builder. > 2. Generate suitable kolla configuration file layout > 3. Hopefully kolla-ansible will hand over > > But we are still in PoC state. > > Fabian > > Tony Liu > > schrieb am Fr., 21. Aug. 2020, 20:49: > > > Hi, > > I wonder if it's supported by Kolla Ansible to deploy a > specific > host and add it into existing cluster, like replace a control > host or compute host? > > > Thanks! > Tony > > > > From emiller at genesishosting.com Sat Aug 22 00:03:00 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Fri, 21 Aug 2020 19:03:00 -0500 Subject: Using os_token In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814561@gmsxchsvr01.thecreation.com> Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814568@gmsxchsvr01.thecreation.com> > The problem though is that keystoneauth actually does fetch a new token > every time even when you supply it with one, but that new token is based > on the one you supply, and is a scoped token. It's likely the api for > getting a token from an existing one is faster than password auth. I > wish there was a way to  have the tools actually reuse a given scoped > token rather than fetch a new one every time... Interesting! I was kinda wondering if that was actually what was happening. It still seems like quite a bit of a delay compared to running the OpenStack Client and running commands on its command line repeatedly (as opposed to loading the OpenStack Client each time). I assumed that there was still some work to load Python, etc., but using --debug does show a pull of the service catalog, which is slow. It definitely would be nice to have a way to save/load the "session" that is created by the OpenStack Client to avoid all of the overhead, or, as you said, provide a scoped token. I tested the performance again to be sure I wasn't going crazy, and with OS_TOKEN, it is definitely between 0.2 and 0.6 (most of the time at the higher end of this) seconds faster. Any improvement is good. > OS_TOKEN is also useful/important because of MFA, which otherwise > wouldn't work unless you reuse a token. And I'm hoping that when someone > has time to work MFA support properly into the cli tool they can > hopefully also think about how to make the token reuse better. You mentioned "MFA support properly". What issue exists? I'm interested since I was about to look into this next. Thanks! Eric From emiller at genesishosting.com Sat Aug 22 00:09:53 2020 From: emiller at genesishosting.com (Eric K. Miller) Date: Fri, 21 Aug 2020 19:09:53 -0500 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: Message-ID: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> > Actually, in my case, the setup is originally deploy by > Kolla Ansible. Other than the initial deployment, I am > looking for using Kolla Ansible for maintenance operations. > What I am looking for, eg. replace a host, can surely be > done by manual steps or customized script. I'd like to know > if they are automated by Kolla Ansible. We do this often by simply using the "limit" flag in Kolla Ansible to only include the controllers and new compute node (after adding the compute node to the multinode.ini file). Specify "reconfigure" for the action, and not "install". Eric From tonyliu0592 at hotmail.com Sat Aug 22 01:14:49 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Sat, 22 Aug 2020 01:14:49 +0000 Subject: [Kolla Ansible] host maintenance In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: Thanks Eric! I will run some tests to validate. Tony > -----Original Message----- > From: Eric K. Miller > Sent: Friday, August 21, 2020 5:10 PM > To: openstack-discuss > Subject: RE: [Kolla Ansible] host maintenance > > > Actually, in my case, the setup is originally deploy by Kolla Ansible. > > Other than the initial deployment, I am looking for using Kolla > > Ansible for maintenance operations. > > What I am looking for, eg. replace a host, can surely be done by > > manual steps or customized script. I'd like to know if they are > > automated by Kolla Ansible. > > We do this often by simply using the "limit" flag in Kolla Ansible to > only include the controllers and new compute node (after adding the > compute node to the multinode.ini file). Specify "reconfigure" for the > action, and not "install". > > Eric From 358111907 at qq.com Sat Aug 22 01:24:48 2020 From: 358111907 at qq.com (=?gb18030?B?wO7WvtS2?=) Date: Sat, 22 Aug 2020 09:24:48 +0800 Subject: About Devstack Message-ID: I'm sorry to disturb you. Recently, I tried to install openstack through devstack. When I input "./stack.sh". I can install openstack successfully. Then I tried to create a cloud instance and use the public network 172.24.4.0/24 which is created during installation( this subnet is created by default, I didn't configure network informartion in local.conf before installation). And the instance can access to the Internet smoothly. But the instance will not access the Internet when I reboot my server  (physical machine). After rebooting, I input "sudo ifconfig br-ex 172.24.4.1/24 up", the instance can access my server IP, but it can't PING the gateway addresses of my server. Of course, the instance also can't access the Internet. But my server can PING it's gateway and access to the Internet. Finally, the cloud instance can only communicate with my server. I tried many methods to restore the network environment of my openstack. But I can't find the reason. So I need your help. I install the version of devstack is stable/train. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at mailbox.org Sat Aug 22 02:33:44 2020 From: berndbausch at mailbox.org (Bernd Bausch) Date: Sat, 22 Aug 2020 11:33:44 +0900 Subject: OpenStack user survey data available? Message-ID: Is there a way to get access to raw user survey data? Some of the graphical data on the analytics page is unreadable, in particular information about Neutron's drivers: The analytics FAQ tells me to contact heidijoy at openstack.org for questions, but email to this address bounces back. 550 5.1.1 : Email address could not be found, or was misspelled (G8) Thanks much, Bernd -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jjojlgndnihopfcd.png Type: image/png Size: 37546 bytes Desc: not available URL: From allison at openstack.org Sat Aug 22 17:09:56 2020 From: allison at openstack.org (Allison Price) Date: Sat, 22 Aug 2020 12:09:56 -0500 Subject: OpenStack user survey data available? In-Reply-To: References: Message-ID: Hi Bernd, Thanks for reaching out. We will change the FAQ with updated contact information, but I can help you on this request. Attached is a spreadsheet with the anonymous data distribution for the Neutron driver question from the 2019 survey. The 2020 data will be available soon, so please let me know if that’s data you would like as well. If there are other specific questions you would like anonymous data on, please let me know as it does need to be pulled manually. Cheers, Allison Allison Price OpenStack Foundation allison at openstack.org > On Aug 21, 2020, at 9:33 PM, Bernd Bausch wrote: > > Is there a way to get access to raw user survey data? Some of the graphical data on the analytics page is unreadable, in particular information about Neutron's drivers: > > The analytics FAQ tells me to contact heidijoy at openstack.org for questions, but email to this address bounces back. > > 550 5.1.1 : Email address could not be found, or was misspelled (G8) > > Thanks much, > > Bernd > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2019_networking drivers.xlsx Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Size: 10184 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahdi.abbasi.2013 at gmail.com Sat Aug 22 16:43:58 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Sat, 22 Aug 2020 21:13:58 +0430 Subject: python-zunclient Message-ID: Hi openstack development team, I installed python-zunclient successfully but openstack appcontainer command steal return not found. Please help me. Best regards Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Sat Aug 22 19:10:55 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sat, 22 Aug 2020 15:10:55 -0400 Subject: python-zunclient In-Reply-To: References: Message-ID: How did you install python-zunclient? and how did you install openstackclient? My best guess is the mess up of python2/3 environment. Mind pasting output of the following commands? $ pip --version $ pip freeze $ pip3 freeze On Sat, Aug 22, 2020 at 3:03 PM mahdi abbasi wrote: > Hi openstack development team, > > I installed python-zunclient successfully but openstack appcontainer > command steal return not found. Please help me. > > Best regards > Mahdi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jomin0613 at gmail.com Sun Aug 23 12:18:32 2020 From: jomin0613 at gmail.com (Mingi Jo) Date: Sun, 23 Aug 2020 21:18:32 +0900 Subject: [keystone] openstack token auth scpore system Question Message-ID: Hi, I'm studying OpenStack.If you use OpenStack and use it with a keystone token on all computers,If there is a project in the endpoint URL, the api request cannot be made properly.The error message is output at 400, and the request fails. We've looked into this, and I've found out,https://bugs.launchpad.net/cinder/+bug/1745905Here's the bug reporting, and I think it's done with the paperwork.However, various services such as cinder, swift, and probe are required to include projects in the endpoint url of the installation guide, which is considered contradictory.Is there any way to fix this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at gmail.com Sun Aug 23 12:39:28 2020 From: berndbausch at gmail.com (Bernd Bausch) Date: Sun, 23 Aug 2020 21:39:28 +0900 Subject: [simplification] Making ask.openstack.org read-only In-Reply-To: References: Message-ID: <648c6ac3-0ab8-e442-ed9b-fbbfbbea16f7@gmail.com> Thanks for calling me out, but I am certainly not the only one answering questions. After the notification feature broke down entirely, leaving me no way to see which questions I am involved in, it's indeed time to move on. I agree with the change as well. Bernd. On 8/18/2020 7:44 PM, Thierry Carrez wrote: > Hi everyone, > > This has been discussed several times on this mailing list in the > past, but we never got to actually pull the plug. > > Ask.openstack.org was launched in 2013. The reason for hosting our own > setup was to be able to support multiple languages, while > StackOverflow rejected our proposal to have our own openstack-branded > StackExchange site. The Chinese ask.o.o side never really took off. > The English side also never really worked perfectly (like email alerts > are hopelessly broken), but we figured it would get better with time > if a big community formed around it. > > Fast-forward to 2020 and the instance is lacking volunteers to help > run it, while the code (and our customization of it) has become more > complicated to maintain. It regularly fails one way or another, and > questions there often go unanswered, making us look bad. Of the top 30 > users, most have abandoned the platform since 2017, leaving only Bernd > Bausch actively engaging and helping moderate questions lately. We > have called for volunteers several times, but the offers for help > never really materialized. > > At the same time, people are asking OpenStack questions on > StackOverflow, and sometimes getting answers there[1]. The > fragmentation of the "questions" space is not helping users getting > good answers. > > I think it's time to pull the plug, make ask.openstack.org read-only > (so that links to old answers are not lost) and redirect users to the > mailing-list and the "OpenStack" tag on StackOverflow. I picked > StackOverflow since it seems to have the most openstack questions > (2,574 on SO, 76 on SuperUser and 430 on ServerFault). > > We discussed that option several times, but I now proposed a change to > actually make it happen: > > https://review.opendev.org/#/c/746497/ > > It's always a difficult decision to make to kill a resource, but I > feel like in this case, consolidation and simplification would help. > > Thoughts, comments? > > [1] https://stackoverflow.com/questions/tagged/openstack > From mahdi.abbasi.2013 at gmail.com Sat Aug 22 19:27:13 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Sat, 22 Aug 2020 23:57:13 +0430 Subject: python-zunclient In-Reply-To: References: Message-ID: Thanks Hongbin, This Issue has been resolved. On Sat, 22 Aug 2020, 23:41 Hongbin Lu, wrote: > How did you install python-zunclient? and how did you install > openstackclient? My best guess is the mess up of python2/3 environment. > Mind pasting output of the following commands? > > $ pip --version > $ pip freeze > $ pip3 freeze > > On Sat, Aug 22, 2020 at 3:03 PM mahdi abbasi > wrote: > >> Hi openstack development team, >> >> I installed python-zunclient successfully but openstack appcontainer >> command steal return not found. Please help me. >> >> Best regards >> Mahdi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alterriu at gmail.com Mon Aug 24 05:31:43 2020 From: alterriu at gmail.com (Popoi Zen) Date: Mon, 24 Aug 2020 12:31:43 +0700 Subject: [ovn][neutron][ussuri] DPDK support on OVN Openstack? Openstack Doc contradictive Information? Message-ID: I want to implement DPDK on my Openstack using OVN as mechanism driver. I read 2 documentation: [1] https://docs.openstack.org/neutron/ussuri/admin/ovn/dpdk.html [2] https://docs.openstack.org/neutron/ussuri/admin/config-ovs-dpdk.html In first doc [1], it is said that DPDK has been supported on OVN. But in second doc [2] it is said `The support of this feature is not yet present in ML2 OVN and ODL mechanism drivers.` Which one is true? because I have TCP checksum issue when implement DPDK on OVN same like this: https://bugs.launchpad.net/neutron/+bug/1832021. It has patched on OVS but not work on OVN. Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: From alterriu at gmail.com Mon Aug 24 06:57:59 2020 From: alterriu at gmail.com (Popoi Zen) Date: Mon, 24 Aug 2020 13:57:59 +0700 Subject: [ovn][neutron][ussuri] DPDK support on OVN Openstack? Openstack Doc contradictive Information? In-Reply-To: References: Message-ID: Ah, it seems like vhost-user-reconnect feature which is not yet supported. But the problem still, I cant get metadata because of tcp checksum error. On Mon, Aug 24, 2020, 12:31 Popoi Zen wrote: > I want to implement DPDK on my Openstack using OVN as mechanism driver. I > read 2 documentation: > [1] https://docs.openstack.org/neutron/ussuri/admin/ovn/dpdk.html > [2] https://docs.openstack.org/neutron/ussuri/admin/config-ovs-dpdk.html > > In first doc [1], it is said that DPDK has been supported on OVN. But in > second doc [2] it is said `The support of this feature is not yet present > in ML2 OVN and ODL mechanism drivers.` > > Which one is true? because I have TCP checksum issue when implement DPDK > on OVN same like this: https://bugs.launchpad.net/neutron/+bug/1832021. > It has patched on OVS but not work on OVN. > > > Regards, > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berndbausch at gmail.com Mon Aug 24 07:32:31 2020 From: berndbausch at gmail.com (Bernd Bausch) Date: Mon, 24 Aug 2020 16:32:31 +0900 Subject: About Devstack In-Reply-To: References: Message-ID: <17933c25-2fe9-4f0e-b1c5-797d4a97a5dc@gmail.com> Devstack is not meant to be restarted. However, setting the IP address on br-ex and bringing it up is normally sufficient to re-establish networking. After that, you probably still need to recreate the loop devices for Cinder and Swift. What I don't understand: The external network that Devstack sets up by default, named "public", is fake. It's not external, and it's not connected to the outside world at all, thus the IP address range of 172.24.4.0/24. How your instances were able to access the internet without any manual tweaking is a mystery to me. If you did some manual tweaking, I guess it was lost when you rebooted. Perhaps you had a non-persistent routing table entry that connected 172.24.4.0/24 to the outside world? Bernd. On 8/22/2020 10:24 AM, 李志远 wrote: > I'm sorry to disturb you. > Recently, I tried to install openstack through devstack. When I input > "./stack.sh". I can install openstack successfully. > Then I tried to create a cloud instance and use the public network > 172.24.4.0/24 which is created during > installation( this subnet is created by default, I didn't configure > network informartion in local.conf before installation). And the > instance can access to the Internet smoothly. > But the instance will not access the Internet when I reboot my server  > (physical machine). After rebooting, I input "sudo ifconfig br-ex > 172.24.4.1/24 up", the instance can access my > server IP, but it can't PING the gateway addresses of my server. Of > course, the instance also can't access the Internet. But my server can > PING it's gateway and access to the Internet. Finally, the cloud > instance can only communicate with my server. > I tried many methods to restore the network environment of my > openstack. But I can't find the reason. So I need your help. I install > the version of devstack is stable/train. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Aug 24 07:46:13 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 24 Aug 2020 08:46:13 +0100 Subject: [Kolla Ansible] host maintenance In-Reply-To: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: On Sat, 22 Aug 2020 at 01:10, Eric K. Miller wrote: > > > Actually, in my case, the setup is originally deploy by > > Kolla Ansible. Other than the initial deployment, I am > > looking for using Kolla Ansible for maintenance operations. > > What I am looking for, eg. replace a host, can surely be > > done by manual steps or customized script. I'd like to know > > if they are automated by Kolla Ansible. > > We do this often by simply using the "limit" flag in Kolla Ansible to only include the controllers and new compute node (after adding the compute node to the multinode.ini file). Specify "reconfigure" for the action, and not "install". We need some better docs around this, and I think they will be added soon. Some things to watch out for: * if adding a new controller, ensure that if using --limit, all controllers are included and do not use serial mode * if removing a controller, reconfigure other controllers to update the RabbitMQ & Galera cluster nodes etc. > > Eric From arne.wiebalck at cern.ch Mon Aug 24 08:24:05 2020 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Mon, 24 Aug 2020 10:24:05 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: Hi! CERN's deployment is using the iscsi deploy interface since we started with Ironic a couple of years ago (and we installed around 5000 nodes with it by now). The reason we chose it at the time was simplicity: we did not (and still do not) have a Swift backend to Glance, and the iscsi interface provided a straightforward alternative. While we have not seen obscure bugs/issues with it, I can certainly back the scalability issues mentioned by Dmitry: the tunneling of the images through the controllers can create issues when deploying hundreds of nodes at the same time. The security of the iscsi interface is less of a concern in our specific environment. So, why did we not move to direct (yet)? In addition to the lack of Swift, mostly since iscsi works for us and the scalability issues were not that much of a burning problem ... so we focused on other things :) Here are some thoughts/suggestions for this discussion: How would 'direct' work with other Glance backends (like Ceph/RBD in our case)? If using direct requires to duplicate images from Glance to Ironic (or somewhere else) to be served, I think this would be an argument against deprecating iscsi. Equally, if this would require to completely move the Glance backend to something else, like from RBD to RadosGW, I would not expect happy operators. (Does anyone know if RadosGW could even replace Swift for this specific use case?) Do we have numbers on how many deployments use iscsi vs direct? If many rely on iscsi, I would also suggest to establish a migration guide for operators on how to move from iscsi to direct, for the various configs. Recent versions of Glance support multiple backends, so a migration path may be to add a new (direct compatible) backend for new images. Cheers, Arne On 20.08.20 17:49, Julia Kreger wrote: > I'm having a sense of deja vu! > > Because of the way the mechanics work, the iscsi deploy driver is in > an unfortunate position of being harder to troubleshoot and diagnose > failures. Which basically means we've not been able to really identify > common failures and add logic to handle them appropriately, like we > are able to with a tcp socket and file download. Based on this alone, > I think it makes a solid case for us to seriously consider > deprecation. > > Overall, I'm +1 for the proposal and I believe over two cycles is the > right way to go. > > I suspect we're going to have lots of push back from the TripleO > community because there has been resistance to change their default > usage in the past. As such I'm adding them to the subject so hopefully > they will be at least aware. > > I guess my other worry is operators who already have a substantial > operational infrastructure investment built around the iscsi deploy > interface. I wonder why they didn't use direct, but maybe they have > all migrated in the past ?5? years. This could just be a non-concern > in reality, I'm just not sure. > > Of course, if someone is willing to step up and make the iscsi > deployment interface their primary focus, that also shifts the > discussion to making direct the default interface? > > -Julia > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur wrote: >> >> Hi all, >> >> Side note for those lacking context: this proposal concerns deprecating one of the ironic deploy interfaces detailed in https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It does not affect the boot-from-iSCSI feature. >> >> I would like to propose deprecating and removing the 'iscsi' deploy interface over the course of the next 2 cycles. The reasons are: >> 1) The iSCSI deploy is a source of occasional cryptic bugs when a target cannot be discovered or mounted properly. >> 2) Its security is questionable: I don't think we even use authentication. >> 3) Operators confusion: right now we default to the iSCSI deploy but pretty much direct everyone who cares about scalability or security to the 'direct' deploy. >> 4) Cost of maintenance: our feature set is growing, our team - not so much. iscsi_deploy.py is 800 lines of code that can be removed, and some dependencies that can be dropped as well. >> >> As far as I can remember, we've kept the iSCSI deploy for two reasons: >> 1) The direct deploy used to require Glance with Swift backend. The recently added [agent]image_download_source option allows caching and serving images via the ironic's HTTP server, eliminating this problem. I guess we'll have to switch to 'http' by default for this option to keep the out-of-box experience. >> 2) Memory footprint of the direct deploy. With the raw images streaming we no longer have to cache the downloaded images in the agent memory, removing this problem as well (I'm not even sure how much of a problem it is in 2020, even my phone has 4GiB of RAM). >> >> If this proposal is accepted, I suggest to execute it as follows: >> Victoria release: >> 1) Put an early deprecation warning in the release notes. >> 2) Announce the future change of the default value for [agent]image_download_source. >> W release: >> 3) Change [agent]image_download_source to 'http' by default. >> 4) Remove iscsi from the default enabled_deploy_interfaces and move it to the back of the supported list (effectively making direct deploy the default). >> X release: >> 5) Remove the iscsi deploy code from both ironic and IPA. >> >> Thoughts, opinions, suggestions? >> >> Dmitry > From dtantsur at redhat.com Mon Aug 24 08:32:57 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 24 Aug 2020 10:32:57 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: Hi, On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck wrote: > Hi! > > CERN's deployment is using the iscsi deploy interface since we started > with Ironic a couple of years ago (and we installed around 5000 nodes > with it by now). The reason we chose it at the time was simplicity: we > did not (and still do not) have a Swift backend to Glance, and the iscsi > interface provided a straightforward alternative. > > While we have not seen obscure bugs/issues with it, I can certainly back > the scalability issues mentioned by Dmitry: the tunneling of the images > through the controllers can create issues when deploying hundreds of > nodes at the same time. The security of the iscsi interface is less of a > concern in our specific environment. > > So, why did we not move to direct (yet)? In addition to the lack of > Swift, mostly since iscsi works for us and the scalability issues were > not that much of a burning problem ... so we focused on other things :) > > Here are some thoughts/suggestions for this discussion: > > How would 'direct' work with other Glance backends (like Ceph/RBD in our > case)? If using direct requires to duplicate images from Glance to > Ironic (or somewhere else) to be served, I think this would be an > argument against deprecating iscsi. > With image_download_source=http ironic will download the image to the conductor to be able serve it to the node. Which is exactly what the iscsi is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a means of serving the image). Would it be an option for you to test direct deploy with image_download_source=http? > > Equally, if this would require to completely move the Glance backend to > something else, like from RBD to RadosGW, I would not expect happy > operators. (Does anyone know if RadosGW could even replace Swift for > this specific use case?) > AFAIK ironic works with RadosGW, we have some support code for it. > > Do we have numbers on how many deployments use iscsi vs direct? If many > rely on iscsi, I would also suggest to establish a migration guide for > operators on how to move from iscsi to direct, for the various configs. > Recent versions of Glance support multiple backends, so a migration path > may be to add a new (direct compatible) backend for new images. > I don't have any numbers, but a migration guide is a must in any case. I expect most TripleO consumers to use the iscsi deploy, but only because it's the default. Their Edge solution uses the direct deploy. I've polled a few operators I know, they all (except for you, obviously :) seem to use the direct deploy. Metal3 uses direct deploy. Dmitry > > Cheers, > Arne > > On 20.08.20 17:49, Julia Kreger wrote: > > I'm having a sense of deja vu! > > > > Because of the way the mechanics work, the iscsi deploy driver is in > > an unfortunate position of being harder to troubleshoot and diagnose > > failures. Which basically means we've not been able to really identify > > common failures and add logic to handle them appropriately, like we > > are able to with a tcp socket and file download. Based on this alone, > > I think it makes a solid case for us to seriously consider > > deprecation. > > > > Overall, I'm +1 for the proposal and I believe over two cycles is the > > right way to go. > > > > I suspect we're going to have lots of push back from the TripleO > > community because there has been resistance to change their default > > usage in the past. As such I'm adding them to the subject so hopefully > > they will be at least aware. > > > > I guess my other worry is operators who already have a substantial > > operational infrastructure investment built around the iscsi deploy > > interface. I wonder why they didn't use direct, but maybe they have > > all migrated in the past ?5? years. This could just be a non-concern > > in reality, I'm just not sure. > > > > Of course, if someone is willing to step up and make the iscsi > > deployment interface their primary focus, that also shifts the > > discussion to making direct the default interface? > > > > -Julia > > > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > wrote: > >> > >> Hi all, > >> > >> Side note for those lacking context: this proposal concerns deprecating > one of the ironic deploy interfaces detailed in > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It > does not affect the boot-from-iSCSI feature. > >> > >> I would like to propose deprecating and removing the 'iscsi' deploy > interface over the course of the next 2 cycles. The reasons are: > >> 1) The iSCSI deploy is a source of occasional cryptic bugs when a > target cannot be discovered or mounted properly. > >> 2) Its security is questionable: I don't think we even use > authentication. > >> 3) Operators confusion: right now we default to the iSCSI deploy but > pretty much direct everyone who cares about scalability or security to the > 'direct' deploy. > >> 4) Cost of maintenance: our feature set is growing, our team - not so > much. iscsi_deploy.py is 800 lines of code that can be removed, and some > dependencies that can be dropped as well. > >> > >> As far as I can remember, we've kept the iSCSI deploy for two reasons: > >> 1) The direct deploy used to require Glance with Swift backend. The > recently added [agent]image_download_source option allows caching and > serving images via the ironic's HTTP server, eliminating this problem. I > guess we'll have to switch to 'http' by default for this option to keep the > out-of-box experience. > >> 2) Memory footprint of the direct deploy. With the raw images streaming > we no longer have to cache the downloaded images in the agent memory, > removing this problem as well (I'm not even sure how much of a problem it > is in 2020, even my phone has 4GiB of RAM). > >> > >> If this proposal is accepted, I suggest to execute it as follows: > >> Victoria release: > >> 1) Put an early deprecation warning in the release notes. > >> 2) Announce the future change of the default value for > [agent]image_download_source. > >> W release: > >> 3) Change [agent]image_download_source to 'http' by default. > >> 4) Remove iscsi from the default enabled_deploy_interfaces and move it > to the back of the supported list (effectively making direct deploy the > default). > >> X release: > >> 5) Remove the iscsi deploy code from both ironic and IPA. > >> > >> Thoughts, opinions, suggestions? > >> > >> Dmitry > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucasagomes at gmail.com Mon Aug 24 08:34:30 2020 From: lucasagomes at gmail.com (Lucas Alvares Gomes) Date: Mon, 24 Aug 2020 09:34:30 +0100 Subject: [neutron] Bug Deputy Report Aug 17-24 Message-ID: Hi, This is the Neutron bug report of the week of 2020-08-17. High: * https://bugs.launchpad.net/neutron/+bug/1891673 - "qrouter ns ip rules not deleted when fip removed from vm" Assigned to: hopem * https://bugs.launchpad.net/neutron/+bug/1892017 - "Neutron server logs are too big in the gate jobs" Assigned to: slaweq * https://bugs.launchpad.net/neutron/+bug/1892477 - "[OVN] Avoid nb_cfg update notification flooding during agents health check" Assigned to: lucasagomes * https://bugs.launchpad.net/neutron/+bug/1892489 - "[Prefix delegation] When subnet with PD enabled is added to the router, L3 agent fails on waiting for LLAs to be available" Assigned to: slaweq Needs further triage: * https://bugs.launchpad.net/neutron/+bug/1892405 - "Removing router interface causes router to stop routing between all" Unassigned * https://bugs.launchpad.net/neutron/+bug/1892496 - "500 on SG deletion: Cannot delete or update a parent row" Unassigned Medium: * https://bugs.launchpad.net/neutron/+bug/1892364 - "L3 agent prefix delegation - adding new subnet to the router fails" Assigned to: brian-haley * https://bugs.launchpad.net/neutron/+bug/1892362 - "Restarting L3 agent when PD is used fails due to IPAddressAlreadyExists error" Assigned to: slaweq Wishlist: * https://bugs.launchpad.net/neutron/+bug/1892200 - "Make keepalived healthcheck more configurable" Unassigned From arne.wiebalck at cern.ch Mon Aug 24 09:03:15 2020 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Mon, 24 Aug 2020 11:03:15 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: <91efc60b-6995-eda0-4ff9-7d6ae6a31641@cern.ch> Hi Dmitry, On 24.08.20 10:32, Dmitry Tantsur wrote: > Hi, > > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck > wrote: > > Hi! > > CERN's deployment is using the iscsi deploy interface since we started > with Ironic a couple of years ago (and we installed around 5000 nodes > with it by now). The reason we chose it at the time was simplicity: we > did not (and still do not) have a Swift backend to Glance, and the iscsi > interface provided a straightforward alternative. > > While we have not seen obscure bugs/issues with it, I can certainly back > the scalability issues mentioned by Dmitry: the tunneling of the images > through the controllers can create issues when deploying hundreds of > nodes at the same time. The security of the iscsi interface is less > of a > concern in our specific environment. > > So, why did we not move to direct (yet)? In addition to the lack of > Swift, mostly since iscsi works for us and the scalability issues were > not that much of a burning problem ... so we focused on other things :) > > Here are some thoughts/suggestions for this discussion: > > How would 'direct' work with other Glance backends (like Ceph/RBD in > our > case)? If using direct requires to duplicate images from Glance to > Ironic (or somewhere else) to be served, I think this would be an > argument against deprecating iscsi. > > > With image_download_source=http ironic will download the image to the > conductor to be able serve it to the node. Which is exactly what the > iscsi is doing, so not much of a change for you (except for > s/iSCSI/HTTP/ as a means of serving the image). > > Would it be an option for you to test direct deploy with > image_download_source=http? Oh, absolutely! I was not aware that setting this option would make Ironic act as an image buffer (I thought this would expect some URL the admin had to provide) ... I will try this and let you know. > > > Equally, if this would require to completely move the Glance backend to > something else, like from RBD to RadosGW, I would not expect happy > operators. (Does anyone know if RadosGW could even replace Swift for > this specific use case?) > > > AFAIK ironic works with RadosGW, we have some support code for it. I was mostly asking to see if RadosGW is a (longer term) option to fully benefit from direct's inherent scaling. > > > Do we have numbers on how many deployments use iscsi vs direct? If many > rely on iscsi, I would also suggest to establish a migration guide for > operators on how to move from iscsi to direct, for the various configs. > Recent versions of Glance support multiple backends, so a migration path > may be to add a new (direct compatible) backend for new images. > > > I don't have any numbers, but a migration guide is a must in any case. > > I expect most TripleO consumers to use the iscsi deploy, but only > because it's the default. Their Edge solution uses the direct deploy. > I've polled a few operators I know, they all (except for you, obviously > :) seem to use the direct deploy. Metal3 uses direct deploy. Thanks! Arne > Dmitry > > > Cheers, >   Arne > > On 20.08.20 17:49, Julia Kreger wrote: > > I'm having a sense of deja vu! > > > > Because of the way the mechanics work, the iscsi deploy driver is in > > an unfortunate position of being harder to troubleshoot and diagnose > > failures. Which basically means we've not been able to really > identify > > common failures and add logic to handle them appropriately, like we > > are able to with a tcp socket and file download. Based on this alone, > > I think it makes a solid case for us to seriously consider > > deprecation. > > > > Overall, I'm +1 for the proposal and I believe over two cycles is the > > right way to go. > > > > I suspect we're going to have lots of push back from the TripleO > > community because there has been resistance to change their default > > usage in the past. As such I'm adding them to the subject so > hopefully > > they will be at least aware. > > > > I guess my other worry is operators who already have a substantial > > operational infrastructure investment built around the iscsi deploy > > interface. I wonder why they didn't use direct, but maybe they have > > all migrated in the past ?5? years. This could just be a non-concern > > in reality, I'm just not sure. > > > > Of course, if someone is willing to step up and make the iscsi > > deployment interface their primary focus, that also shifts the > > discussion to making direct the default interface? > > > > -Julia > > > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > > wrote: > >> > >> Hi all, > >> > >> Side note for those lacking context: this proposal concerns > deprecating one of the ironic deploy interfaces detailed in > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. > It does not affect the boot-from-iSCSI feature. > >> > >> I would like to propose deprecating and removing the 'iscsi' > deploy interface over the course of the next 2 cycles. The reasons are: > >> 1) The iSCSI deploy is a source of occasional cryptic bugs when > a target cannot be discovered or mounted properly. > >> 2) Its security is questionable: I don't think we even use > authentication. > >> 3) Operators confusion: right now we default to the iSCSI deploy > but pretty much direct everyone who cares about scalability or > security to the 'direct' deploy. > >> 4) Cost of maintenance: our feature set is growing, our team - > not so much. iscsi_deploy.py is 800 lines of code that can be > removed, and some dependencies that can be dropped as well. > >> > >> As far as I can remember, we've kept the iSCSI deploy for two > reasons: > >> 1) The direct deploy used to require Glance with Swift backend. > The recently added [agent]image_download_source option allows > caching and serving images via the ironic's HTTP server, eliminating > this problem. I guess we'll have to switch to 'http' by default for > this option to keep the out-of-box experience. > >> 2) Memory footprint of the direct deploy. With the raw images > streaming we no longer have to cache the downloaded images in the > agent memory, removing this problem as well (I'm not even sure how > much of a problem it is in 2020, even my phone has 4GiB of RAM). > >> > >> If this proposal is accepted, I suggest to execute it as follows: > >> Victoria release: > >> 1) Put an early deprecation warning in the release notes. > >> 2) Announce the future change of the default value for > [agent]image_download_source. > >> W release: > >> 3) Change [agent]image_download_source to 'http' by default. > >> 4) Remove iscsi from the default enabled_deploy_interfaces and > move it to the back of the supported list (effectively making direct > deploy the default). > >> X release: > >> 5) Remove the iscsi deploy code from both ironic and IPA. > >> > >> Thoughts, opinions, suggestions? > >> > >> Dmitry > > > From aaronzhu1121 at gmail.com Mon Aug 24 09:13:24 2020 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Mon, 24 Aug 2020 17:13:24 +0800 Subject: [MURANO] Murano Class error when try to deploy WordPress APP In-Reply-To: References: Message-ID: Hi, Sorry for the later reply. Recently I am busy with some internal works, I don't have much time to debug this. I think you should check the app package first. I will debug it when I am free. İzzettin Erdem 于2020年8月20日 周四20:47写道: > Hello everyone, > > WordPress needs Mysql, HTTP and Zabbix Server/Agent. These apps run > individually with succes but when I try to deploy WordPress App on Murano > it gives the error about Apache HTTP that mentioned below. > > How can I fix this? Do you have any suggestions? > > Error: > http://paste.openstack.org/show/796980/ > http://paste.openstack.org/show/796983/ (cont.) > > > -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From luyao.zhong at intel.com Mon Aug 24 09:19:56 2020 From: luyao.zhong at intel.com (Zhong, Luyao) Date: Mon, 24 Aug 2020 09:19:56 +0000 Subject: [Nova] We are dropping the 'delete_instance_files' virt driver interface Message-ID: <183EFA13E8A23E4AA7057ED9BCC1102E3E542685@SHSMSX107.ccr.corp.intel.com> Hi all especially maintainers of out-of-tree drivers, Please pay attention to this change. https://review.opendev.org/#/c/714653/ We are dropping 'delete_instance_files' virt driver interface, and will use "cleanup_instance" to take charge of lingering instance cleanup, including instance files deleting and whatever we add in the future. Best Regards, Luyao From thierry at openstack.org Mon Aug 24 10:02:19 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 24 Aug 2020 12:02:19 +0200 Subject: [largescale-sig] Next meeting: August 26, 8utc Message-ID: Hi everyone, Our next meeting will be a EU-APAC-friendly meeting, on Wednesday, August 26 at 8 UTC[1] in the #openstack-meeting-3 channel on IRC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200826T08 Feel free to add topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting A reminder of the TODOs we had from last meeting, in case you have time to make progress on them: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation Talk to you all on Wednesday, -- Thierry Carrez From smooney at redhat.com Mon Aug 24 11:41:51 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Aug 2020 12:41:51 +0100 Subject: About Devstack In-Reply-To: <17933c25-2fe9-4f0e-b1c5-797d4a97a5dc@gmail.com> References: <17933c25-2fe9-4f0e-b1c5-797d4a97a5dc@gmail.com> Message-ID: <9d33bc9f1bc819c4041b2b7217c652e685ed35c7.camel@redhat.com> On Mon, 2020-08-24 at 16:32 +0900, Bernd Bausch wrote: > Devstack is not meant to be restarted. However, setting the IP address > on br-ex and bringing it up is normally sufficient to re-establish > networking. After that, you probably still need to recreate the loop > devices for Cinder and Swift. devstack used to support restart with a seperate script. That was remvoed but when we latter swapped to systemd it almost fixed restart there are a cloule of things that are not done porperly to allow restart to work but really we should proably just fix those. some run directores are missng after the reboot that prevent some wsgi servicces restarting if you correct that i think that is basically all tha tis needed. > > What I don't understand: The external network that Devstack sets up by > default, named "public", is fake. It's not external, and it's not > connected to the outside world at all, thus the IP address range of > 172.24.4.0/24. How your instances were able to access the internet > without any manual tweaking is a mystery to me. If you did some manual > tweaking, I guess it was lost when you rebooted. > > Perhaps you had a non-persistent routing table entry that connected > 172.24.4.0/24 to the outside world? https://www.rdoproject.org/networking/networking-in-too-much-detail/#network-host-external-traffic-kl this might help. the networking is not exactly fake but if you dont configure your br-ex to have a port attached and configure your router as the gateway for the 172.24.4.0/28 network you wont have connectivity you can alternitivly nat the traffic form openstack. there are some devstack docs on how to configure networing https://github.com/openstack/devstack/blob/master/doc/source/networking.rst but basically devstack will not majically make your openstack network routeable on your physical network you have to do some manual operatoin on your router to make that happen. > > Bernd. > > On 8/22/2020 10:24 AM, 閺夊骸绻旀潻锟 wrote: > > I'm sorry to disturb you. > > Recently, I tried to install openstack through devstack. When I input > > "./stack.sh". I can install openstack successfully. > > Then I tried to create a cloud instance and use the public network > > 172.24.4.0/24 which is created during > > installation( this subnet is created by default, I didn't configure > > network informartion in local.conf before installation). And the > > instance can access to the Internet smoothly. > > But the instance will not access the Internet when I reboot my server鑱 > > (physical machine). After rebooting, I input "sudo ifconfig br-ex > > 172.24.4.1/24 up", the instance can access my > > server IP, but it can't PING the gateway addresses of my server. Of > > course, the instance also can't access the Internet. But my server can > > PING it's gateway and access to the Internet. Finally, the cloud > > instance can only communicate with my server. > > I tried many methods to restore the network environment of my > > openstack. But I can't find the reason. So I need your help. I install > > the version of devstack is stable/train. Thank you very much! From smooney at redhat.com Mon Aug 24 11:52:48 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Aug 2020 12:52:48 +0100 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote: > Hi, > > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck > wrote: > > > Hi! > > > > CERN's deployment is using the iscsi deploy interface since we started > > with Ironic a couple of years ago (and we installed around 5000 nodes > > with it by now). The reason we chose it at the time was simplicity: we > > did not (and still do not) have a Swift backend to Glance, and the iscsi > > interface provided a straightforward alternative. > > > > While we have not seen obscure bugs/issues with it, I can certainly back > > the scalability issues mentioned by Dmitry: the tunneling of the images > > through the controllers can create issues when deploying hundreds of > > nodes at the same time. The security of the iscsi interface is less of a > > concern in our specific environment. > > > > So, why did we not move to direct (yet)? In addition to the lack of > > Swift, mostly since iscsi works for us and the scalability issues were > > not that much of a burning problem ... so we focused on other things :) > > > > Here are some thoughts/suggestions for this discussion: > > > > How would 'direct' work with other Glance backends (like Ceph/RBD in our > > case)? If using direct requires to duplicate images from Glance to > > Ironic (or somewhere else) to be served, I think this would be an > > argument against deprecating iscsi. > > > > With image_download_source=http ironic will download the image to the > conductor to be able serve it to the node. Which is exactly what the iscsi > is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a > means of serving the image). > > Would it be an option for you to test direct deploy with > image_download_source=http? i think if there is still an option to not force deployemnt to altere any of there other sevices this is likely ok but i think the onious shoudl be on the ironic and ooo teams to ensure there is an upgrade path for those useres before this deprecation becomes a removal without deploying swift or a swift compatibale api e.g. RadosGW perhaps a ci job could be put in place maybe using grenade that starts with iscsi and moves to direct with http porvided to show that just setting that weill allow the conductor to download the image from glance and server it to the ipa. unlike cern i just use ironic in a tiny home deployment where i have an all in one deployment + 4 addtional nodes for ironic. i cant deploy swift as all my disks are already in use for cinder so down the line when i eventually upgrade to vicortia and wallaby i would either have to drop ironic or not upgrade it if there is not a option to just pull the image from glance or glance via the conductor. enhancing the ipa to pull directly from glance would also proably work for many who use iscsi today but that would depend on your network toplogy i guess. > > > > > > Equally, if this would require to completely move the Glance backend to > > something else, like from RBD to RadosGW, I would not expect happy > > operators. (Does anyone know if RadosGW could even replace Swift for > > this specific use case?) > > > > AFAIK ironic works with RadosGW, we have some support code for it. > > > > > > Do we have numbers on how many deployments use iscsi vs direct? If many > > rely on iscsi, I would also suggest to establish a migration guide for > > operators on how to move from iscsi to direct, for the various configs. > > Recent versions of Glance support multiple backends, so a migration path > > may be to add a new (direct compatible) backend for new images. > > > > I don't have any numbers, but a migration guide is a must in any case. > > I expect most TripleO consumers to use the iscsi deploy, but only because > it's the default. Their Edge solution uses the direct deploy. I've polled a > few operators I know, they all (except for you, obviously :) seem to use > the direct deploy. Metal3 uses direct deploy. > > Dmitry > > > > > > Cheers, > > Arne > > > > On 20.08.20 17:49, Julia Kreger wrote: > > > I'm having a sense of deja vu! > > > > > > Because of the way the mechanics work, the iscsi deploy driver is in > > > an unfortunate position of being harder to troubleshoot and diagnose > > > failures. Which basically means we've not been able to really identify > > > common failures and add logic to handle them appropriately, like we > > > are able to with a tcp socket and file download. Based on this alone, > > > I think it makes a solid case for us to seriously consider > > > deprecation. > > > > > > Overall, I'm +1 for the proposal and I believe over two cycles is the > > > right way to go. > > > > > > I suspect we're going to have lots of push back from the TripleO > > > community because there has been resistance to change their default > > > usage in the past. As such I'm adding them to the subject so hopefully > > > they will be at least aware. > > > > > > I guess my other worry is operators who already have a substantial > > > operational infrastructure investment built around the iscsi deploy > > > interface. I wonder why they didn't use direct, but maybe they have > > > all migrated in the past ?5? years. This could just be a non-concern > > > in reality, I'm just not sure. > > > > > > Of course, if someone is willing to step up and make the iscsi > > > deployment interface their primary focus, that also shifts the > > > discussion to making direct the default interface? > > > > > > -Julia > > > > > > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > > > > wrote: > > > > > > > > Hi all, > > > > > > > > Side note for those lacking context: this proposal concerns deprecating > > > > one of the ironic deploy interfaces detailed in > > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It > > does not affect the boot-from-iSCSI feature. > > > > > > > > I would like to propose deprecating and removing the 'iscsi' deploy > > > > interface over the course of the next 2 cycles. The reasons are: > > > > 1) The iSCSI deploy is a source of occasional cryptic bugs when a > > > > target cannot be discovered or mounted properly. > > > > 2) Its security is questionable: I don't think we even use > > > > authentication. > > > > 3) Operators confusion: right now we default to the iSCSI deploy but > > > > pretty much direct everyone who cares about scalability or security to the > > 'direct' deploy. > > > > 4) Cost of maintenance: our feature set is growing, our team - not so > > > > much. iscsi_deploy.py is 800 lines of code that can be removed, and some > > dependencies that can be dropped as well. > > > > > > > > As far as I can remember, we've kept the iSCSI deploy for two reasons: > > > > 1) The direct deploy used to require Glance with Swift backend. The > > > > recently added [agent]image_download_source option allows caching and > > serving images via the ironic's HTTP server, eliminating this problem. I > > guess we'll have to switch to 'http' by default for this option to keep the > > out-of-box experience. > > > > 2) Memory footprint of the direct deploy. With the raw images streaming > > > > we no longer have to cache the downloaded images in the agent memory, > > removing this problem as well (I'm not even sure how much of a problem it > > is in 2020, even my phone has 4GiB of RAM). > > > > > > > > If this proposal is accepted, I suggest to execute it as follows: > > > > Victoria release: > > > > 1) Put an early deprecation warning in the release notes. > > > > 2) Announce the future change of the default value for > > > > [agent]image_download_source. > > > > W release: > > > > 3) Change [agent]image_download_source to 'http' by default. > > > > 4) Remove iscsi from the default enabled_deploy_interfaces and move it > > > > to the back of the supported list (effectively making direct deploy the > > default). > > > > X release: > > > > 5) Remove the iscsi deploy code from both ironic and IPA. > > > > > > > > Thoughts, opinions, suggestions? > > > > > > > > Dmitry > > > > From fungi at yuggoth.org Mon Aug 24 11:58:37 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 24 Aug 2020 11:58:37 +0000 Subject: [ovn][neutron][ussuri] DPDK support on OVN Openstack? Openstack Doc contradictive Information? In-Reply-To: References: Message-ID: <20200824115836.jysj7yegqkrpndhn@yuggoth.org> On 2020-08-24 13:57:59 +0700 (+0700), Popoi Zen wrote: [...] > I cant get metadata because of tcp checksum error. [...] I'm not familiar with the rest of the challenges you're facing, but consider using configdrive for metadata access. It's generally more reliable and resilient than trying to retrieve metadata over the network. https://docs.openstack.org/nova/ussuri/admin/config-drive.html -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dtantsur at redhat.com Mon Aug 24 12:05:47 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 24 Aug 2020 14:05:47 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> References: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> Message-ID: On Mon, Aug 24, 2020 at 1:52 PM Sean Mooney wrote: > On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote: > > Hi, > > > > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck > > wrote: > > > > > Hi! > > > > > > CERN's deployment is using the iscsi deploy interface since we started > > > with Ironic a couple of years ago (and we installed around 5000 nodes > > > with it by now). The reason we chose it at the time was simplicity: we > > > did not (and still do not) have a Swift backend to Glance, and the > iscsi > > > interface provided a straightforward alternative. > > > > > > While we have not seen obscure bugs/issues with it, I can certainly > back > > > the scalability issues mentioned by Dmitry: the tunneling of the images > > > through the controllers can create issues when deploying hundreds of > > > nodes at the same time. The security of the iscsi interface is less of > a > > > concern in our specific environment. > > > > > > So, why did we not move to direct (yet)? In addition to the lack of > > > Swift, mostly since iscsi works for us and the scalability issues were > > > not that much of a burning problem ... so we focused on other things :) > > > > > > Here are some thoughts/suggestions for this discussion: > > > > > > How would 'direct' work with other Glance backends (like Ceph/RBD in > our > > > case)? If using direct requires to duplicate images from Glance to > > > Ironic (or somewhere else) to be served, I think this would be an > > > argument against deprecating iscsi. > > > > > > > With image_download_source=http ironic will download the image to the > > conductor to be able serve it to the node. Which is exactly what the > iscsi > > is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a > > means of serving the image). > > > > Would it be an option for you to test direct deploy with > > image_download_source=http? > i think if there is still an option to not force deployemnt to altere any > of there > other sevices this is likely ok but i think the onious shoudl be on the > ironic > and ooo teams to ensure there is an upgrade path for those useres before > this deprecation > becomes a removal without deploying swift or a swift compatibale api e.g. > RadosGW > Swift is NOT a requirement (nor is RadosGW) when image_download_source=http is used. Any glance backend (or no glance at all) will work. > > perhaps a ci job could be put in place maybe using grenade that starts > with iscsi and moves > to direct with http porvided to show that just setting that weill allow > the conductor to download > the image from glance and server it to the ipa. > We already have CI jobs that do it, I'm not sure what grenade would win us? At this point, we keep grenade jobs barely working at all (actually, the multinode grenade job is not working), we cannot add anything there. Dmitry > > > unlike cern i just use ironic in a tiny home deployment where i have an > all in one deployment + 4 addtional > nodes for ironic. i cant deploy swift as all my disks are already in use > for cinder so down the line when > i eventually upgrade to vicortia and wallaby i would either have to drop > ironic or not upgrade it > if there is not a option to just pull the image from glance or glance via > the conductor. enhancing the ipa > to pull directly from glance would also proably work for many who use > iscsi today but that would depend on your network > toplogy i guess. > > > > > > > > > > Equally, if this would require to completely move the Glance backend to > > > something else, like from RBD to RadosGW, I would not expect happy > > > operators. (Does anyone know if RadosGW could even replace Swift for > > > this specific use case?) > > > > > > > AFAIK ironic works with RadosGW, we have some support code for it. > > > > > > > > > > Do we have numbers on how many deployments use iscsi vs direct? If many > > > rely on iscsi, I would also suggest to establish a migration guide for > > > operators on how to move from iscsi to direct, for the various configs. > > > Recent versions of Glance support multiple backends, so a migration > path > > > may be to add a new (direct compatible) backend for new images. > > > > > > > I don't have any numbers, but a migration guide is a must in any case. > > > > I expect most TripleO consumers to use the iscsi deploy, but only because > > it's the default. Their Edge solution uses the direct deploy. I've > polled a > > few operators I know, they all (except for you, obviously :) seem to use > > the direct deploy. Metal3 uses direct deploy. > > > > Dmitry > > > > > > > > > > Cheers, > > > Arne > > > > > > On 20.08.20 17:49, Julia Kreger wrote: > > > > I'm having a sense of deja vu! > > > > > > > > Because of the way the mechanics work, the iscsi deploy driver is in > > > > an unfortunate position of being harder to troubleshoot and diagnose > > > > failures. Which basically means we've not been able to really > identify > > > > common failures and add logic to handle them appropriately, like we > > > > are able to with a tcp socket and file download. Based on this alone, > > > > I think it makes a solid case for us to seriously consider > > > > deprecation. > > > > > > > > Overall, I'm +1 for the proposal and I believe over two cycles is the > > > > right way to go. > > > > > > > > I suspect we're going to have lots of push back from the TripleO > > > > community because there has been resistance to change their default > > > > usage in the past. As such I'm adding them to the subject so > hopefully > > > > they will be at least aware. > > > > > > > > I guess my other worry is operators who already have a substantial > > > > operational infrastructure investment built around the iscsi deploy > > > > interface. I wonder why they didn't use direct, but maybe they have > > > > all migrated in the past ?5? years. This could just be a non-concern > > > > in reality, I'm just not sure. > > > > > > > > Of course, if someone is willing to step up and make the iscsi > > > > deployment interface their primary focus, that also shifts the > > > > discussion to making direct the default interface? > > > > > > > > -Julia > > > > > > > > > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > > > > > > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > Side note for those lacking context: this proposal concerns > deprecating > > > > > > one of the ironic deploy interfaces detailed in > > > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. > It > > > does not affect the boot-from-iSCSI feature. > > > > > > > > > > I would like to propose deprecating and removing the 'iscsi' deploy > > > > > > interface over the course of the next 2 cycles. The reasons are: > > > > > 1) The iSCSI deploy is a source of occasional cryptic bugs when a > > > > > > target cannot be discovered or mounted properly. > > > > > 2) Its security is questionable: I don't think we even use > > > > > > authentication. > > > > > 3) Operators confusion: right now we default to the iSCSI deploy > but > > > > > > pretty much direct everyone who cares about scalability or security to > the > > > 'direct' deploy. > > > > > 4) Cost of maintenance: our feature set is growing, our team - not > so > > > > > > much. iscsi_deploy.py is 800 lines of code that can be removed, and > some > > > dependencies that can be dropped as well. > > > > > > > > > > As far as I can remember, we've kept the iSCSI deploy for two > reasons: > > > > > 1) The direct deploy used to require Glance with Swift backend. The > > > > > > recently added [agent]image_download_source option allows caching and > > > serving images via the ironic's HTTP server, eliminating this problem. > I > > > guess we'll have to switch to 'http' by default for this option to > keep the > > > out-of-box experience. > > > > > 2) Memory footprint of the direct deploy. With the raw images > streaming > > > > > > we no longer have to cache the downloaded images in the agent memory, > > > removing this problem as well (I'm not even sure how much of a problem > it > > > is in 2020, even my phone has 4GiB of RAM). > > > > > > > > > > If this proposal is accepted, I suggest to execute it as follows: > > > > > Victoria release: > > > > > 1) Put an early deprecation warning in the release notes. > > > > > 2) Announce the future change of the default value for > > > > > > [agent]image_download_source. > > > > > W release: > > > > > 3) Change [agent]image_download_source to 'http' by default. > > > > > 4) Remove iscsi from the default enabled_deploy_interfaces and > move it > > > > > > to the back of the supported list (effectively making direct deploy the > > > default). > > > > > X release: > > > > > 5) Remove the iscsi deploy code from both ironic and IPA. > > > > > > > > > > Thoughts, opinions, suggestions? > > > > > > > > > > Dmitry > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Mon Aug 24 12:06:22 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 24 Aug 2020 14:06:22 +0200 Subject: [nova] virtual PTG and Forum planning Message-ID: Hi, As you probably know the next virtual PTG will be held between October 26-30. I need to book time slots for Nova [1] so please add your availability to the doodle [2] before 7th of September. I have created an etherpad [3] to collect the PTG topics for the Nova sessions. Feel free to add your topics. Also there will be a Forum between October 19-23 [4]. You can use the PTG etherpad [3] to brainstorm forum topics before the official CFP opens. Cheers, gibi [1] https://ethercalc.openstack.org/7xp2pcbh1ncb [2] https://doodle.com/poll/a5pgqh7bypq8piew [3] https://etherpad.opendev.org/p/nova-wallaby-ptg [4] https://wiki.openstack.org/wiki/Forum/Virtual202 From smooney at redhat.com Mon Aug 24 12:26:50 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 24 Aug 2020 13:26:50 +0100 Subject: [ovn][neutron][ussuri] DPDK support on OVN Openstack? Openstack Doc contradictive Information? In-Reply-To: <20200824115836.jysj7yegqkrpndhn@yuggoth.org> References: <20200824115836.jysj7yegqkrpndhn@yuggoth.org> Message-ID: <3df1492ddefd4066df24736b8eba273992ba1d5a.camel@redhat.com> On Mon, 2020-08-24 at 11:58 +0000, Jeremy Stanley wrote: > On 2020-08-24 13:57:59 +0700 (+0700), Popoi Zen wrote: > [...] > > I cant get metadata because of tcp checksum error. > > [...] > > I'm not familiar with the rest of the challenges you're facing, but > consider using configdrive for metadata access. It's generally more > reliable and resilient than trying to retrieve metadata over the > network. > > https://docs.openstack.org/nova/ussuri/admin/config-drive.html ovn should not change the packet processing vs ml2/ovs with dpdk if you are haveing checksum issue its proably an issue with your underlying network or dpdk/ovs not the fact your using ovn. From arnaud.morin at gmail.com Mon Aug 24 12:41:54 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 24 Aug 2020 12:41:54 +0000 Subject: [neutron][ops] q-agent-notifier exchanges without bindings. In-Reply-To: References: Message-ID: <20200824124154.GA31915@sync> Hey, I did exactly the same on my side. I also have unroutable messages going in my alternate exchange, related to the same exchanges (q-agent-notifier-security_group-update_fanout, etc.) Did you figured out why you have unroutable messages like this? Are you using a custom neutron driver? Cheers, -- Arnaud Morin On 21.08.20 - 10:32, Fabian Zimmermann wrote: > Hi, > > im currently on the way to analyse some rabbitmq-issues. > > atm im taking a look on "unroutable messages", so I > > * created an Alternative Exchange and Queue: "unroutable" > * created a policy to send all unroutable msgs to this exchange/queue. > * wrote a script to show me the msgs placed here.. currently I get > > Seems like my neutron is placing msgs in these exchanges, but there is > nobody listening/binding to: > -- > 20 Exchange: q-agent-notifier-network-delete_fanout, RoutingKey: > 226 Exchange: q-agent-notifier-port-delete_fanout, RoutingKey: > 88 Exchange: q-agent-notifier-port-update_fanout, RoutingKey: > 388 Exchange: q-agent-notifier-security_group-update_fanout, RoutingKey: > -- > > Is someone able to give me a hint where to look at / how to debug this? > > Fabian > From gagehugo at gmail.com Mon Aug 24 13:17:43 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 24 Aug 2020 08:17:43 -0500 Subject: [openstack-helm] OpenStack-Helm Meeting Aug 25th Cancelled Message-ID: Good morning, The openstack-helm meeting for tomorrow, Aug 25th 2020 will be cancelled, we will see you all next week! -------------- next part -------------- An HTML attachment was scrubbed... URL: From gagehugo at gmail.com Mon Aug 24 13:18:43 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Mon, 24 Aug 2020 08:18:43 -0500 Subject: [security] Security SIG Meeting Aug 27th Cancelled Message-ID: Good morning, The security SIG meeting for Thursday, Aug 27th 2020 will be cancelled. We will meet next week at the usual time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Mon Aug 24 14:12:17 2020 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 24 Aug 2020 16:12:17 +0200 Subject: [all] READMEs of zuul roles not rendered properly - missing content Message-ID: Hello everyone, I've noticed that READMEs of zuul roles within openstack projects are not rendered properly on opendev.org - ".. zuul:rolevar::" syntax seems to be the problem. Although it's rendered well on github.com, see f.e. [1] [2]. I wonder if there were some changes in the supported README syntax. Also the ".. zuul:rolevar::" syntax throws errors on online rst formatters I was testing on, however, it's rendered fine by md online formatters - maybe opendev.org is more rst strict in case of .rst files than github? Any ideas? [1] https://opendev.org/openstack/tempest/src/branch/master/roles/run-tempest [2] https://github.com/openstack/tempest/tree/master/roles/run-tempest Thanks, -- Martin Kopec Quality Engineer Red Hat EMEA -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Mon Aug 24 14:19:04 2020 From: eblock at nde.ag (Eugen Block) Date: Mon, 24 Aug 2020 14:19:04 +0000 Subject: [horizon] default create_volume setting can't be changed Message-ID: <20200824141904.Horde.biUwyDcXRQDK2D0KW6vwbE1@webmail.nde.ag> Hi *, we recently upgraded from Ocata to Train and I'm struggling with a specific setting: I believe since Pike version the default for "create_volume" changed to "true" when launching instances from Horizon dashboard. I would like to change that to "false" and set it in our custom /srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.d/_100_local_settings.py: LAUNCH_INSTANCE_DEFAULTS = { 'config_drive': False, 'create_volume': False, 'hide_create_volume': False, 'disable_image': False, 'disable_instance_snapshot': False, 'disable_volume': False, 'disable_volume_snapshot': False, 'enable_scheduler_hints': True, } Other configs from this file work as expected, so that custom file can't be the reason. After apache and memcached restart nothing changes, the default is still "true". Can anyone shed some light, please? I haven't tried other configs yet so I can't tell if more options are affected. Thanks! Eugen From fungi at yuggoth.org Mon Aug 24 14:36:18 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 24 Aug 2020 14:36:18 +0000 Subject: [all][infra] READMEs of zuul roles not rendered properly - missing content In-Reply-To: References: Message-ID: <20200824143618.7xdecj67m5jzwpkz@yuggoth.org> On 2020-08-24 16:12:17 +0200 (+0200), Martin Kopec wrote: > I've noticed that READMEs of zuul roles within openstack projects > are not rendered properly on opendev.org - ".. zuul:rolevar::" > syntax seems to be the problem. Although it's rendered well on > github.com, see f.e. [1] [2]. > > I wonder if there were some changes in the supported README > syntax. Also the ".. zuul:rolevar::" syntax throws errors on > online rst formatters I was testing on, however, it's rendered > fine by md online formatters - maybe opendev.org is more rst > strict in case of .rst files than github? > > Any ideas? > > [1] https://opendev.org/openstack/tempest/src/branch/master/roles/run-tempest > [2] https://github.com/openstack/tempest/tree/master/roles/run-tempest Those wrappers rely on the zuul_sphinx plugin, which needs to be included in docs builds thusly: That extension allows you to build job and role documentation like this: https://zuul-ci.org/docs/zuul-jobs/ If the project doesn't plan to build such documentation, they can of course be omitted (openstack/tempest's doc/source/conf.py doesn't use zuul_sphinx that I can see). As far as why the README rendering in Gitea is tripping over it, sounds likely to be a bug in whatever reStructuredText parser library it uses. Was this working up until recently? We did just upgrade Gitea again in the past week or two. To be entirely honest, I wish Gitea didn't automatically attempt to render RST files, that makes it harder to actually refer to the source code for them, and it's a source code browser not a CMS for publishing documentation, but apparently this is a feature many other users do like for some reason. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cboylan at sapwetik.org Mon Aug 24 15:05:41 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 24 Aug 2020 08:05:41 -0700 Subject: =?UTF-8?Q?Re:_[all][infra]_READMEs_of_zuul_roles_not_rendered_properly_-?= =?UTF-8?Q?_missing_content?= In-Reply-To: <20200824143618.7xdecj67m5jzwpkz@yuggoth.org> References: <20200824143618.7xdecj67m5jzwpkz@yuggoth.org> Message-ID: On Mon, Aug 24, 2020, at 7:36 AM, Jeremy Stanley wrote: > On 2020-08-24 16:12:17 +0200 (+0200), Martin Kopec wrote: > > I've noticed that READMEs of zuul roles within openstack projects > > are not rendered properly on opendev.org - ".. zuul:rolevar::" > > syntax seems to be the problem. Although it's rendered well on > > github.com, see f.e. [1] [2]. > > > > I wonder if there were some changes in the supported README > > syntax. Also the ".. zuul:rolevar::" syntax throws errors on > > online rst formatters I was testing on, however, it's rendered > > fine by md online formatters - maybe opendev.org is more rst > > strict in case of .rst files than github? > > > > Any ideas? > > > > [1] https://opendev.org/openstack/tempest/src/branch/master/roles/run-tempest > > [2] https://github.com/openstack/tempest/tree/master/roles/run-tempest > > Those wrappers rely on the zuul_sphinx plugin, which needs to be > included in docs builds thusly: > > https://opendev.org/zuul/zuul-jobs/src/commit/1e92a67db6f5fa3f3284d5b1928f104c428187f3/doc/source/conf.py#L24 > > > That extension allows you to build job and role documentation like > this: > > https://zuul-ci.org/docs/zuul-jobs/ > > If the project doesn't plan to build such documentation, they can of > course be omitted (openstack/tempest's doc/source/conf.py doesn't > use zuul_sphinx that I can see). As far as why the README rendering > in Gitea is tripping over it, sounds likely to be a bug in whatever > reStructuredText parser library it uses. Was this working up until > recently? We did just upgrade Gitea again in the past week or two. We use pandoc to render the rst files, and the choice of tool and commands to run is driven entirely by config [3]. If we want to switch to another tool we'll want to ensure the new tool is installed in our container image [4]. Its possible that simply using the right pandoc options would get us what we want here? > > To be entirely honest, I wish Gitea didn't automatically attempt to > render RST files, that makes it harder to actually refer to the > source code for them, and it's a source code browser not a CMS for > publishing documentation, but apparently this is a feature many > other users do like for some reason. We can change this behavior by removing the external renderer (though I expect we're in the minority of preferring ability to link to the source here). [3] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea/templates/app.ini.j2#L88-L95 [4] https://opendev.org/opendev/system-config/src/branch/master/docker/gitea/Dockerfile#L92-L94 > -- > Jeremy Stanley From emilien at redhat.com Mon Aug 24 15:28:12 2020 From: emilien at redhat.com (Emilien Macchi) Date: Mon, 24 Aug 2020 11:28:12 -0400 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: <1597847922905.32607@binero.com> Message-ID: I went ahead and added Takashi to the newly created puppet-tripleo core group in Gerrit. Thanks again for your hard work! On Thu, Aug 20, 2020 at 9:25 AM Karthik, Rajini wrote: > +1 . > > > > Rajini > > > > *From:* Wesley Hayutin > *Sent:* Wednesday, August 19, 2020 9:09 PM > *To:* openstack-discuss > *Cc:* Emilien Macchi > *Subject:* Re: [tripleo] Proposing Takashi Kajinami to be core on > puppet-tripleo > > > > [EXTERNAL EMAIL] > > > > > > On Wed, Aug 19, 2020 at 8:40 AM Tobias Urdin > wrote: > > Big +1 from an outsider :)) > > > > Best regards > > Tobias > > > ------------------------------ > > *From:* Rabi Mishra > *Sent:* Wednesday, August 19, 2020 3:37 PM > *To:* Emilien Macchi > *Cc:* openstack-discuss > *Subject:* Re: [tripleo] Proposing Takashi Kajinami to be core on > puppet-tripleo > > > > +1 > > > > On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi wrote: > > Hi people, > > > > If you don't know Takashi yet, he has been involved in the Puppet > OpenStack project and helped *a lot* in its maintenance (and by maintenance > I mean not-funny-work). When our community was getting smaller and smaller, > he joined us and our review velicity went back to eleven. He became a core > maintainer very quickly and we're glad to have him onboard. > > > > He's also been involved in taking care of puppet-tripleo for a few months > and I believe he has more than enough knowledge on the module to provide > core reviews and be part of the core maintainer group. I also noticed his > amount of contribution (bug fixes, improvements, reviews, etc) in other > TripleO repos and I'm confident he'll make his road to be core in TripleO > at some point. For now I would like him to propose him to be core in > puppet-tripleo. > > > > As usual, any feedback is welcome but in the meantime I want to thank > Takashi for his work in TripleO and we're super happy to have new > contributors! > > > > Thanks, > > -- > > Emilien Macchi > > > > > -- > > Regards, > > Rabi Mishra > > > > > > +1, thanks for your contributions Takashi! > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Mon Aug 24 15:47:51 2020 From: satish.txt at gmail.com (Satish Patel) Date: Mon, 24 Aug 2020 11:47:51 -0400 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: References: <20200806144016.GP31915@sync> <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <28f04c4eff84aa6d15424f3de3706ae9ec361fa7.camel@redhat.com> Message-ID: Sorry for the late reply Sean, When you said Cells is only a nova feature what does that mean? Correct me if i am wrong here, only nova means i can deploy rabbitmq in cells to just handl nova-* services but not neutron or any other services right? On Sun, Aug 16, 2020 at 9:37 AM Sean Mooney wrote: > > On Sat, 2020-08-15 at 20:13 -0400, Satish Patel wrote: > > Hi Sean, > > > > Sounds good, but running rabbitmq for each service going to be little > > overhead also, how do you scale cluster (Yes we can use cellv2 but its > > not something everyone like to do because of complexity). > > my understanding is that when using rabbitmq adding multiple rabbitmq servers in a cluster lowers > througput vs jsut 1 rabbitmq instance for any given excahnge. that is because the content of > the queue need to be syconised across the cluster. so if cinder nova and neutron share > a 3 node cluster and your compaure that to the same service deployed with cinder nova and neuton > each having there on rabbitmq service then the independent deployment will tend to out perform the > clustered solution. im not really sure if that has change i know tha thow clustering has been donw has evovled > over the years but in the past clustering was the adversary of scaling. > > > If we thinks > > rabbitMQ is growing pain then why community not looking for > > alternative option (kafka) etc..? > we have looked at alternivives several times > rabbit mq wroks well enough ans scales well enough for most deployments. > there other amqp implimantation that scale better then rabbit, > activemq and qpid are both reported to scale better but they perfrom worse > out of the box and need to be carfully tuned > > in the past zeromq has been supported but peole did not maintain it. > > kafka i dont think is a good alternative but nats https://nats.io/ might be. > > for what its worth all nova deployment are cellv2 deployments with 1 cell from around pike/rocky > and its really not that complex. cells_v1 was much more complex bug part of the redesign > for cells_v2 was makeing sure there is only 1 code path. adding a second cell just need another > cell db and conductor to be deployed assuming you startted with a super conductor in the first > place. the issue is cells is only a nova feature no other service have cells so it does not help > you with cinder or neutron. as such cinder an neutron likely be the services that hit scaling limits first. > adopign cells in other services is not nessaryally the right approch either but when we talk about scale > we do need to keep in mind that cells is just for nova today. > > > > > > On Fri, Aug 14, 2020 at 3:09 PM Sean Mooney wrote: > > > > > > On Fri, 2020-08-14 at 18:45 +0200, Fabian Zimmermann wrote: > > > > Hi, > > > > > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > > > one rabbitmq Container per Service. Just the kubernetes self healing is > > > > used as "ha" for rabbitmq. > > > > > > > > That seems to match with my finding: run rabbitmq standalone and use an > > > > external system to restart rabbitmq if required. > > > > > > thats the design that was orginally planned for kolla-kubernetes orrignally > > > > > > each service was to be deployed with its own rabbit mq server if it required one > > > and if it crashed it woudl just be recreated by k8s. it perfromace better then a cluster > > > and if you trust k8s or the external service enough to ensure it is recteated it > > > should be as effective a solution. you dont even need k8s to do that but it seams to be > > > a good fit if your prepared to ocationally loose inflight rpcs. > > > if you not then you can configure rabbit to persite all message to disk and mont that on a shared > > > file system like nfs or cephfs so that when the rabbit instance is recreated the queue contency is > > > perserved. assuming you can take the perfromance hit of writing all messages to disk that is. > > > > > > > > Fabian > > > > > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > > > > > Fabian, > > > > > > > > > > what do you mean? > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > reasons. > > > > > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > > > wrote: > > > > > > > > > > > > Hello again, > > > > > > > > > > > > just a short update about the results of my tests. > > > > > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > > > > > 1. without durable-queues and without replication - just one > > > > > > > > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > > > 2. durable-queues and replication > > > > > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > > > > > * broken / non working bindings > > > > > > * broken queues > > > > > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > > > > > > > > reasons. > > > > > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > > > > > > > > replication but without durable-queues. > > > > > > > > > > > > May someone point me to the best way to document these findings to some > > > > > > > > > > official doc? > > > > > > I think a lot of installations out there will run into issues if - under > > > > > > > > > > load - a node fails. > > > > > > > > > > > > Fabian > > > > > > > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > > > > > > > > dev.faz at gmail.com>: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > just did some short tests today in our test-environment (without > > > > > > > > > > durable queues and without replication): > > > > > > > > > > > > > > * started a rally task to generate some load > > > > > > > * kill-9-ed rabbitmq on one node > > > > > > > * rally task immediately stopped and the cloud (mostly) stopped working > > > > > > > > > > > > > > after some debugging i found (again) exchanges which had bindings to > > > > > > > > > > queues, but these bindings didnt forward any msgs. > > > > > > > Wrote a small script to detect these broken bindings and will now check > > > > > > > > > > if this is "reproducible" > > > > > > > > > > > > > > then I will try "durable queues" and "durable queues with replication" > > > > > > > > > > to see if this helps. Even if I would expect > > > > > > > rabbitmq should be able to handle this without these "hidden broken > > > > > > > > > > bindings" > > > > > > > > > > > > > > This just FYI. > > > > > > > > > > > > > > Fabian > > > > > From tonyliu0592 at hotmail.com Mon Aug 24 16:53:02 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Mon, 24 Aug 2020 16:53:02 +0000 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: > -----Original Message----- > From: Mark Goddard > Sent: Monday, August 24, 2020 12:46 AM > To: Eric K. Miller > Cc: openstack-discuss > Subject: Re: [Kolla Ansible] host maintenance > > On Sat, 22 Aug 2020 at 01:10, Eric K. Miller > wrote: > > > > > Actually, in my case, the setup is originally deploy by Kolla > > > Ansible. Other than the initial deployment, I am looking for using > > > Kolla Ansible for maintenance operations. > > > What I am looking for, eg. replace a host, can surely be done by > > > manual steps or customized script. I'd like to know if they are > > > automated by Kolla Ansible. > > > > We do this often by simply using the "limit" flag in Kolla Ansible to > only include the controllers and new compute node (after adding the > compute node to the multinode.ini file). Specify "reconfigure" for the > action, and not "install". > > We need some better docs around this, and I think they will be added > soon. Some things to watch out for: > > * if adding a new controller, ensure that if using --limit, all > controllers are included and do not use serial mode What I tried was to replace a controller, where I don't need to update other controllers, because there is no address update. If there is address update caused by controller change, then all controllers have to be included to get update. What's "serial mode"? > * if removing a controller, reconfigure other controllers to update the > RabbitMQ & Galera cluster nodes etc. In this case, are those services who don't need any updates going to be restarted or untouched? Thanks! Tony From mark at stackhpc.com Mon Aug 24 18:20:46 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 24 Aug 2020 19:20:46 +0100 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: On Mon, 24 Aug 2020 at 17:53, Tony Liu wrote: > > > -----Original Message----- > > From: Mark Goddard > > Sent: Monday, August 24, 2020 12:46 AM > > To: Eric K. Miller > > Cc: openstack-discuss > > Subject: Re: [Kolla Ansible] host maintenance > > > > On Sat, 22 Aug 2020 at 01:10, Eric K. Miller > > wrote: > > > > > > > Actually, in my case, the setup is originally deploy by Kolla > > > > Ansible. Other than the initial deployment, I am looking for using > > > > Kolla Ansible for maintenance operations. > > > > What I am looking for, eg. replace a host, can surely be done by > > > > manual steps or customized script. I'd like to know if they are > > > > automated by Kolla Ansible. > > > > > > We do this often by simply using the "limit" flag in Kolla Ansible to > > only include the controllers and new compute node (after adding the > > compute node to the multinode.ini file). Specify "reconfigure" for the > > action, and not "install". > > > > We need some better docs around this, and I think they will be added > > soon. Some things to watch out for: > > > > * if adding a new controller, ensure that if using --limit, all > > controllers are included and do not use serial mode > > What I tried was to replace a controller, where I don't need to > update other controllers, because there is no address update. > > If there is address update caused by controller change, then all > controllers have to be included to get update. While this may work at the moment, we have just merged a change that prevents this. For keystone, we need access to all controllers, to determine whether it is a new cluster or a new node in an existing cluster. > > What's "serial mode"? Ansible has a feature to run plays in batches of some % of the hosts. In Kolla Ansible you can e.g. export ANSIBLE_SERIAL=0.1. It's an advanced use case and needs some care. > > > * if removing a controller, reconfigure other controllers to update the > > RabbitMQ & Galera cluster nodes etc. > > In this case, are those services who don't need any updates going > to be restarted or untouched? > > Thanks! > Tony > From tonyliu0592 at hotmail.com Mon Aug 24 18:50:04 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Mon, 24 Aug 2020 18:50:04 +0000 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: > -----Original Message----- > From: Mark Goddard > Sent: Monday, August 24, 2020 11:21 AM > To: Tony Liu > Cc: Eric K. Miller ; openstack-discuss > > Subject: Re: [Kolla Ansible] host maintenance > > On Mon, 24 Aug 2020 at 17:53, Tony Liu wrote: > > > > > -----Original Message----- > > > From: Mark Goddard > > > Sent: Monday, August 24, 2020 12:46 AM > > > To: Eric K. Miller > > > Cc: openstack-discuss > > > Subject: Re: [Kolla Ansible] host maintenance > > > > > > On Sat, 22 Aug 2020 at 01:10, Eric K. Miller > > > > > > wrote: > > > > > > > > > Actually, in my case, the setup is originally deploy by Kolla > > > > > Ansible. Other than the initial deployment, I am looking for > > > > > using Kolla Ansible for maintenance operations. > > > > > What I am looking for, eg. replace a host, can surely be done by > > > > > manual steps or customized script. I'd like to know if they are > > > > > automated by Kolla Ansible. > > > > > > > > We do this often by simply using the "limit" flag in Kolla Ansible > > > > to > > > only include the controllers and new compute node (after adding the > > > compute node to the multinode.ini file). Specify "reconfigure" for > > > the action, and not "install". > > > > > > We need some better docs around this, and I think they will be added > > > soon. Some things to watch out for: > > > > > > * if adding a new controller, ensure that if using --limit, all > > > controllers are included and do not use serial mode > > > > What I tried was to replace a controller, where I don't need to update > > other controllers, because there is no address update. > > > > If there is address update caused by controller change, then all > > controllers have to be included to get update. > > While this may work at the moment, we have just merged a change that > prevents this. For keystone, we need access to all controllers, to > determine whether it is a new cluster or a new node in an existing > cluster. > > > > > What's "serial mode"? > > Ansible has a feature to run plays in batches of some % of the hosts. > In Kolla Ansible you can e.g. export ANSIBLE_SERIAL=0.1. It's an > advanced use case and needs some care. > > > > > > * if removing a controller, reconfigure other controllers to update > > > the RabbitMQ & Galera cluster nodes etc. > > > > In this case, are those services who don't need any updates going to > > be restarted or untouched? Could you comment on this? This is my biggest concern. I'd like to ensure services who don't need update remain untouched. Thanks! Tony From mnaser at vexxhost.com Mon Aug 24 18:54:40 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 24 Aug 2020 14:54:40 -0400 Subject: [nova][neutron][oslo][ops][kolla] rabbit bindings issue In-Reply-To: <20200818120708.GV31915@sync> References: <1a338d7e-c82c-cda2-2d47-b5aebb999142@openstack.org> <20200818120708.GV31915@sync> Message-ID: On Tue, Aug 18, 2020 at 8:11 AM Arnaud Morin wrote: > > Hey all, > > About the vexxhost strategy to use only one rabbit server and manage HA through > rabbit. > Do you plan to do the same for MariaDB/MySQL? We use a MySQL operator to deploy a good o'l master/slave replication cluster and point towards the master, for every service, for two reasons: 1) We always pointed to a master Galera system anyways, multi-master was overcomplicated for no real advantage 2) The failover time vs the complexity of Galera (and how often we failover) favours #1 3) We use "orchestrator" by GitHub which manages all the promotions/etc for us > -- > Arnaud Morin > > On 14.08.20 - 18:45, Fabian Zimmermann wrote: > > Hi, > > > > i read somewhere that vexxhosts kubernetes openstack-Operator is running > > one rabbitmq Container per Service. Just the kubernetes self healing is > > used as "ha" for rabbitmq. > > > > That seems to match with my finding: run rabbitmq standalone and use an > > external system to restart rabbitmq if required. > > > > Fabian > > > > Satish Patel schrieb am Fr., 14. Aug. 2020, 16:59: > > > > > Fabian, > > > > > > what do you mean? > > > > > > >> I think vexxhost is running (1) with their openstack-operator - for > > > reasons. > > > > > > On Fri, Aug 14, 2020 at 7:28 AM Fabian Zimmermann > > > wrote: > > > > > > > > Hello again, > > > > > > > > just a short update about the results of my tests. > > > > > > > > I currently see 2 ways of running openstack+rabbitmq > > > > > > > > 1. without durable-queues and without replication - just one > > > rabbitmq-process which gets (somehow) restarted if it fails. > > > > 2. durable-queues and replication > > > > > > > > Any other combination of these settings leads to more or less issues with > > > > > > > > * broken / non working bindings > > > > * broken queues > > > > > > > > I think vexxhost is running (1) with their openstack-operator - for > > > reasons. > > > > > > > > I added [kolla], because kolla-ansible is installing rabbitmq with > > > replication but without durable-queues. > > > > > > > > May someone point me to the best way to document these findings to some > > > official doc? > > > > I think a lot of installations out there will run into issues if - under > > > load - a node fails. > > > > > > > > Fabian > > > > > > > > > > > > Am Do., 13. Aug. 2020 um 15:13 Uhr schrieb Fabian Zimmermann < > > > dev.faz at gmail.com>: > > > >> > > > >> Hi, > > > >> > > > >> just did some short tests today in our test-environment (without > > > durable queues and without replication): > > > >> > > > >> * started a rally task to generate some load > > > >> * kill-9-ed rabbitmq on one node > > > >> * rally task immediately stopped and the cloud (mostly) stopped working > > > >> > > > >> after some debugging i found (again) exchanges which had bindings to > > > queues, but these bindings didnt forward any msgs. > > > >> Wrote a small script to detect these broken bindings and will now check > > > if this is "reproducible" > > > >> > > > >> then I will try "durable queues" and "durable queues with replication" > > > to see if this helps. Even if I would expect > > > >> rabbitmq should be able to handle this without these "hidden broken > > > bindings" > > > >> > > > >> This just FYI. > > > >> > > > >> Fabian > > > > -- Mohammed Naser VEXXHOST, Inc. From pierre at stackhpc.com Mon Aug 24 19:30:25 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 24 Aug 2020 21:30:25 +0200 Subject: [blazar] IRC meetings cancelled this week Message-ID: Hello, Apologies for the short notice: due to scheduling conflicts, I am not available to chair either of the Blazar IRC meetings this week. I propose that we cancel them. Thanks, Pierre Riteau (priteau) From sbaker at redhat.com Mon Aug 24 21:55:23 2020 From: sbaker at redhat.com (Steve Baker) Date: Tue, 25 Aug 2020 09:55:23 +1200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> Message-ID: On 25/08/20 12:05 am, Dmitry Tantsur wrote: > > > On Mon, Aug 24, 2020 at 1:52 PM Sean Mooney > wrote: > > On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote: > > Hi, > > > > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck > > > > wrote: > > > > > Hi! > > > > > > CERN's deployment is using the iscsi deploy interface since we > started > > > with Ironic a couple of years ago (and we installed around > 5000 nodes > > > with it by now). The reason we chose it at the time was > simplicity: we > > > did not (and still do not) have a Swift backend to Glance, and > the iscsi > > > interface provided a straightforward alternative. > > > > > > While we have not seen obscure bugs/issues with it, I can > certainly back > > > the scalability issues mentioned by Dmitry: the tunneling of > the images > > > through the controllers can create issues when deploying > hundreds of > > > nodes at the same time. The security of the iscsi interface is > less of a > > > concern in our specific environment. > > > > > > So, why did we not move to direct (yet)? In addition to the > lack of > > > Swift, mostly since iscsi works for us and the scalability > issues were > > > not that much of a burning problem ... so we focused on other > things :) > > > > > > Here are some thoughts/suggestions for this discussion: > > > > > > How would 'direct' work with other Glance backends (like > Ceph/RBD in our > > > case)? If using direct requires to duplicate images from Glance to > > > Ironic (or somewhere else) to be served, I think this would be an > > > argument against deprecating iscsi. > > > > > > > With image_download_source=http ironic will download the image > to the > > conductor to be able serve it to the node. Which is exactly what > the iscsi > > is doing, so not much of a change for you (except for > s/iSCSI/HTTP/ as a > > means of serving the image). > > > > Would it be an option for you to test direct deploy with > > image_download_source=http? > i think if there is still an option to not force deployemnt to > altere any of there > other sevices this is likely ok but i think the onious shoudl be > on the ironic > and ooo teams to ensure there is an upgrade path for those useres > before this deprecation > becomes a removal without deploying swift or a swift compatibale > api e.g. RadosGW > > > Swift is NOT a requirement (nor is RadosGW) when > image_download_source=http is used. Any glance backend (or no glance > at all) will work. Even though the TripleO undercloud has swift, I'd be inclined to do image_download_source=http so that it can scale out to minions, and so we're not relying on a single-node swift for image serving -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Mon Aug 24 23:02:17 2020 From: tkajinam at redhat.com (Takashi Kajinami) Date: Tue, 25 Aug 2020 08:02:17 +0900 Subject: [tripleo] Proposing Takashi Kajinami to be core on puppet-tripleo In-Reply-To: References: <1597847922905.32607@binero.com> Message-ID: Thank you, Emilien and the others who shared your kind feedback. It's my great pleasure and honor to have this happen. I'll keep doing my best to make more contribution to TripleO project, On Tue, Aug 25, 2020 at 12:30 AM Emilien Macchi wrote: > I went ahead and added Takashi to the newly created puppet-tripleo core > group in Gerrit. > > Thanks again for your hard work! > > On Thu, Aug 20, 2020 at 9:25 AM Karthik, Rajini > wrote: > >> +1 . >> >> >> >> Rajini >> >> >> >> *From:* Wesley Hayutin >> *Sent:* Wednesday, August 19, 2020 9:09 PM >> *To:* openstack-discuss >> *Cc:* Emilien Macchi >> *Subject:* Re: [tripleo] Proposing Takashi Kajinami to be core on >> puppet-tripleo >> >> >> >> [EXTERNAL EMAIL] >> >> >> >> >> >> On Wed, Aug 19, 2020 at 8:40 AM Tobias Urdin >> wrote: >> >> Big +1 from an outsider :)) >> >> >> >> Best regards >> >> Tobias >> >> >> ------------------------------ >> >> *From:* Rabi Mishra >> *Sent:* Wednesday, August 19, 2020 3:37 PM >> *To:* Emilien Macchi >> *Cc:* openstack-discuss >> *Subject:* Re: [tripleo] Proposing Takashi Kajinami to be core on >> puppet-tripleo >> >> >> >> +1 >> >> >> >> On Tue, Aug 18, 2020 at 8:03 PM Emilien Macchi >> wrote: >> >> Hi people, >> >> >> >> If you don't know Takashi yet, he has been involved in the Puppet >> OpenStack project and helped *a lot* in its maintenance (and by maintenance >> I mean not-funny-work). When our community was getting smaller and smaller, >> he joined us and our review velicity went back to eleven. He became a core >> maintainer very quickly and we're glad to have him onboard. >> >> >> >> He's also been involved in taking care of puppet-tripleo for a few months >> and I believe he has more than enough knowledge on the module to provide >> core reviews and be part of the core maintainer group. I also noticed his >> amount of contribution (bug fixes, improvements, reviews, etc) in other >> TripleO repos and I'm confident he'll make his road to be core in TripleO >> at some point. For now I would like him to propose him to be core in >> puppet-tripleo. >> >> >> >> As usual, any feedback is welcome but in the meantime I want to thank >> Takashi for his work in TripleO and we're super happy to have new >> contributors! >> >> >> >> Thanks, >> >> -- >> >> Emilien Macchi >> >> >> >> >> -- >> >> Regards, >> >> Rabi Mishra >> >> >> >> >> >> +1, thanks for your contributions Takashi! >> > > > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam47priya at gmail.com Tue Aug 25 04:29:50 2020 From: sam47priya at gmail.com (Sam P) Date: Tue, 25 Aug 2020 13:29:50 +0900 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Hi All, I add the following members to the core team. Fabian Zimmermann dev.faz at googlemail.com Jegor van Opdorpjegor at greenedge.cloud Radosław Piliszekradoslaw.piliszek at gmail.com suzhengweisugar-2008 at 163.com Please let me or other core members know if any one else would like to join the core team. --- Regards, Sampath On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann wrote: > > Hi, > > As long as there are enough cores to keep the project running everything is fine :) > > Fabian > > Jean-Philippe Evrard schrieb am Fr., 21. Aug. 2020, 16:32: >> >> >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: >> > Hi, >> > >> > if nobody complains I also would like to request core status to help getting the project further. >> > >> > Fabian Zimmermann >> >> Let's hope this will not be lost in the list :) >> From sam47priya at gmail.com Tue Aug 25 04:47:25 2020 From: sam47priya at gmail.com (Sam P) Date: Tue, 25 Aug 2020 13:47:25 +0900 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Thank you all for volunteering to maintain the project. > Please let me know how we should proceed with the meetings. > I can start them on Tuesdays at 7 AM UTC. > And since the Masakari own channel is quite a peaceful one, I would > suggest to run them there directly. > What are your thoughts? :-) I think #openstack-masakari channel is all set to conduct the meeting. I am totally OK with that. And Tuesday at 7AM UTC also works for me. Previously we conducted the meeting every two weeks (on even weeks). How about others? Please add comments to the following review. https://review.opendev.org/#/c/747819/ --- Regards, Sampath On Tue, Aug 25, 2020 at 1:29 PM Sam P wrote: > > Hi All, > > I add the following members to the core team. > > Fabian Zimmermann dev.faz at googlemail.com > Jegor van Opdorpjegor at greenedge.cloud > Radosław Piliszekradoslaw.piliszek at gmail.com > suzhengweisugar-2008 at 163.com > > Please let me or other core members know if any one else would like to > join the core team. > --- Regards, > Sampath > > On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann wrote: > > > > Hi, > > > > As long as there are enough cores to keep the project running everything is fine :) > > > > Fabian > > > > Jean-Philippe Evrard schrieb am Fr., 21. Aug. 2020, 16:32: > >> > >> > >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > >> > Hi, > >> > > >> > if nobody complains I also would like to request core status to help getting the project further. > >> > > >> > Fabian Zimmermann > >> > >> Let's hope this will not be lost in the list :) > >> From yasufum.o at gmail.com Tue Aug 25 05:27:51 2020 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Tue, 25 Aug 2020 14:27:51 +0900 Subject: [tacker] IRC meeting Message-ID: <636004ca-130b-58ee-c769-19169926fcee@gmail.com> Hi tacker team, I am not available to join IRC meeting today unfortunately. I would like to suggest to anyone host the meeting, or skip it if no items. Thanks, Yasufumi From luis.ramirez at opencloud.es Tue Aug 25 05:32:23 2020 From: luis.ramirez at opencloud.es (Luis Ramirez) Date: Tue, 25 Aug 2020 07:32:23 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: +1 I'll try to do my best. Please add me to the core Br, Luis Rmz Blockchain, DevOps & Open Source Cloud Solutions Architect ---------------------------------------- Founder & CEO OpenCloud.es luis.ramirez at opencloud.es Skype ID: d.overload Hangouts: luis.ramirez at opencloud.es [image: ] +34 911 950 123 / [image: ]+39 392 1289553 / [image: ]+49 152 26917722 / Česká republika: +420 774 274 882 ----------------------------------------------------- El mar., 25 ago. 2020 a las 6:52, Sam P () escribió: > Thank you all for volunteering to maintain the project. > > Please let me know how we should proceed with the meetings. > > I can start them on Tuesdays at 7 AM UTC. > > And since the Masakari own channel is quite a peaceful one, I would > > suggest to run them there directly. > > What are your thoughts? :-) > I think #openstack-masakari channel is all set to conduct the meeting. > I am totally OK with that. And Tuesday at 7AM UTC also works for me. > Previously we conducted the meeting every two weeks (on even weeks). > How about others? > Please add comments to the following review. > https://review.opendev.org/#/c/747819/ > > --- Regards, > Sampath > > On Tue, Aug 25, 2020 at 1:29 PM Sam P wrote: > > > > Hi All, > > > > I add the following members to the core team. > > > > Fabian Zimmermann dev.faz at googlemail.com > > Jegor van Opdorpjegor at greenedge.cloud > > Radosław Piliszekradoslaw.piliszek at gmail.com > > suzhengweisugar-2008 at 163.com > > > > Please let me or other core members know if any one else would like to > > join the core team. > > --- Regards, > > Sampath > > > > On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann > wrote: > > > > > > Hi, > > > > > > As long as there are enough cores to keep the project running > everything is fine :) > > > > > > Fabian > > > > > > Jean-Philippe Evrard schrieb am Fr., 21. > Aug. 2020, 16:32: > > >> > > >> > > >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > > >> > Hi, > > >> > > > >> > if nobody complains I also would like to request core status to > help getting the project further. > > >> > > > >> > Fabian Zimmermann > > >> > > >> Let's hope this will not be lost in the list :) > > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyliu0592 at hotmail.com Tue Aug 25 06:00:18 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Tue, 25 Aug 2020 06:00:18 +0000 Subject: [Monasca] Kolla docker image on Docker Hub Message-ID: Hi, Are those Monasca Kolla container images kolla/centos-binary-monasca-* on Docker Hub? I only see kolla/centos-binary-monasca-grafana. I am running Kolla Ansible to deploy Monasca and got this failure. ======== docker.errors.ImageNotFound: 404 Client Error: Not Found (\"pull access denied for kolla/centos-binary-monasca-api, repository does not exist or may require \\'docker login\\': denied: requested access to the resource is denied\") ======== Thanks! Tony From arnaud.morin at gmail.com Tue Aug 25 06:07:15 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Tue, 25 Aug 2020 06:07:15 +0000 Subject: [neutron][ops] q-agent-notifier exchanges without bindings. In-Reply-To: <20200824124154.GA31915@sync> References: <20200824124154.GA31915@sync> Message-ID: <20200825060715.GB31915@sync> Hi again, If I understand correctly neutron code, we have security group rule update notified twice: First with SecurityGroupServerNotifierRpcMixin [1] Second with ResourcesPushRpcApi [2] Can someone involved in neutron code confirm that? It seems that, in OVS agent implementation, [1] is not used (my agent is not consuming those messages), but neutron server is sending messages in this exchange. This is why I have unroutable messages. [1] https://github.com/openstack/neutron/blob/3793f1f3888a85fc5e48c0e94e6a9f3c05e95c43/neutron/db/securitygroups_rpc_base.py#L40 [2] https://github.com/openstack/neutron/blob/f8b990736ba91af098e467608c6dfa0b801ec19c/neutron/api/rpc/handlers/resources_rpc.py#L198 -- Arnaud Morin On 24.08.20 - 12:41, Arnaud Morin wrote: > Hey, > > I did exactly the same on my side. > I also have unroutable messages going in my alternate exchange, related > to the same exchanges (q-agent-notifier-security_group-update_fanout, > etc.) > > Did you figured out why you have unroutable messages like this? > Are you using a custom neutron driver? > > Cheers, > > -- > Arnaud Morin > > On 21.08.20 - 10:32, Fabian Zimmermann wrote: > > Hi, > > > > im currently on the way to analyse some rabbitmq-issues. > > > > atm im taking a look on "unroutable messages", so I > > > > * created an Alternative Exchange and Queue: "unroutable" > > * created a policy to send all unroutable msgs to this exchange/queue. > > * wrote a script to show me the msgs placed here.. currently I get > > > > Seems like my neutron is placing msgs in these exchanges, but there is > > nobody listening/binding to: > > -- > > 20 Exchange: q-agent-notifier-network-delete_fanout, RoutingKey: > > 226 Exchange: q-agent-notifier-port-delete_fanout, RoutingKey: > > 88 Exchange: q-agent-notifier-port-update_fanout, RoutingKey: > > 388 Exchange: q-agent-notifier-security_group-update_fanout, RoutingKey: > > -- > > > > Is someone able to give me a hint where to look at / how to debug this? > > > > Fabian > > From arne.wiebalck at cern.ch Tue Aug 25 06:30:14 2020 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Tue, 25 Aug 2020 08:30:14 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> Message-ID: Hi Steve, On 24.08.20 23:55, Steve Baker wrote: > > On 25/08/20 12:05 am, Dmitry Tantsur wrote: >> >> >> On Mon, Aug 24, 2020 at 1:52 PM Sean Mooney > > wrote: >> >> On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote: >> > Hi, >> > >> > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck >> > >> > wrote: >> > >> > > Hi! >> > > >> > > CERN's deployment is using the iscsi deploy interface since we >> started >> > > with Ironic a couple of years ago (and we installed around >> 5000 nodes >> > > with it by now). The reason we chose it at the time was >> simplicity: we >> > > did not (and still do not) have a Swift backend to Glance, and >> the iscsi >> > > interface provided a straightforward alternative. >> > > >> > > While we have not seen obscure bugs/issues with it, I can >> certainly back >> > > the scalability issues mentioned by Dmitry: the tunneling of >> the images >> > > through the controllers can create issues when deploying >> hundreds of >> > > nodes at the same time. The security of the iscsi interface is >> less of a >> > > concern in our specific environment. >> > > >> > > So, why did we not move to direct (yet)? In addition to the >> lack of >> > > Swift, mostly since iscsi works for us and the scalability >> issues were >> > > not that much of a burning problem ... so we focused on other >> things :) >> > > >> > > Here are some thoughts/suggestions for this discussion: >> > > >> > > How would 'direct' work with other Glance backends (like >> Ceph/RBD in our >> > > case)? If using direct requires to duplicate images from Glance to >> > > Ironic (or somewhere else) to be served, I think this would be an >> > > argument against deprecating iscsi. >> > > >> > >> > With image_download_source=http ironic will download the image >> to the >> > conductor to be able serve it to the node. Which is exactly what >> the iscsi >> > is doing, so not much of a change for you (except for >> s/iSCSI/HTTP/ as a >> > means of serving the image). >> > >> > Would it be an option for you to test direct deploy with >> > image_download_source=http? >> i think if there is still an option to not force deployemnt to >> altere any of there >> other sevices this is likely ok but i think the onious shoudl be >> on the ironic >> and ooo teams to ensure there is an upgrade path for those useres >> before this deprecation >> becomes a removal without deploying swift or a swift compatibale >> api e.g. RadosGW >> >> >> Swift is NOT a requirement (nor is RadosGW) when >> image_download_source=http is used. Any glance backend (or no glance >> at all) will work. > > Even though the TripleO undercloud has swift, I'd be inclined to do > image_download_source=http so that it can scale out to minions, and so > we're not relying on a single-node swift for image serving This makes it sound a little like 'direct' with image_download_source=http would be easily scalable ... but it is only if you can (and are willing to) scale the Ironic control plane through which the images are still tunneled (and Glance behind it ... not sure if there is any caching of images inside the Ironic controllers). Seems to be the case for you and TripleO, but it may not be the case in other setups, using conductor groups may complicated things, for instance. So, from what I see, image_download_source=http is a good option to move deployments off the iscsi deploy interface, but it does not bring the same (scalability) advantages you would get from a setup where Glance is backed by a scalable Swift or RadosGW backend. Cheers, Arne From radoslaw.piliszek at gmail.com Tue Aug 25 07:08:31 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 25 Aug 2020 09:08:31 +0200 Subject: [Monasca] Kolla docker image on Docker Hub In-Reply-To: References: Message-ID: Hi Tony, RDO does not package Monasca so it does not exist in the binary flavour (except for dedicated Grafana). Please consult [1]. Your immediate workaround is to use source flavour for Monasca. [1] https://docs.openstack.org/kolla/ussuri/support_matrix.html -yoctozepto On Tue, Aug 25, 2020 at 8:11 AM Tony Liu wrote: > > Hi, > > Are those Monasca Kolla container images > kolla/centos-binary-monasca-* on Docker Hub? > I only see kolla/centos-binary-monasca-grafana. > > I am running Kolla Ansible to deploy Monasca and got this failure. > ======== > docker.errors.ImageNotFound: 404 Client Error: Not Found (\"pull access denied for kolla/centos-binary-monasca-api, repository does not exist or may require \\'docker login\\': denied: requested access to the resource is denied\") > ======== > > Thanks! > Tony > > From eblock at nde.ag Tue Aug 25 07:42:12 2020 From: eblock at nde.ag (Eugen Block) Date: Tue, 25 Aug 2020 07:42:12 +0000 Subject: [horizon] default create_volume setting can't be changed In-Reply-To: <20200824141904.Horde.biUwyDcXRQDK2D0KW6vwbE1@webmail.nde.ag> Message-ID: <20200825074212.Horde.X1xxti0Yt3f-evdwG6CWJyC@webmail.nde.ag> Update: I found one (the right?) place to change the default to false: /srv/www/openstack-dashboard/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js // create_volume_default: true, create_volume_default: false, I've been struggling for years now with these dashboard settings, it started with /srv/www/openstack-dashboard/openstack_dashboard/dashboards/project/instances/workflows/create_instance.py where I needed to remove the default disk identifier (hard-coded "vda") when we were still running with xen hypervisors to let nova change the disk name. Then this changed and it had to be one of these files, I can't remember which was first, I just know that after some months I had to apply my changes to the other file, too: /srv/www/openstack-dashboard/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js /srv/www/openstack-dashboard/openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/source/source.controller.js I'm not a developer but I must say, I don't really understand this setup and why it changes all the time. Of course I might be looking in the wrong places, it would be great if someone could point me to the right direction! I'm also willing to provide more information if necessary. > Other configs from this file work as expected, so that custom file > can't be the reason. I might be wrong about that, too. I noticed that although I disabled the debug settings in /srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.d/_100_local_settings.py DEBUG = False I was still seeing debug messages. I had to turn them off in /srv/www/openstack-dashboard/openstack_dashboard/settings.py to be applied. So there might be other changes not applied from our custom config file. I'd really appreciate it if anyone could comment on this. Thanks, Eugen Zitat von Eugen Block : > Hi *, > > we recently upgraded from Ocata to Train and I'm struggling with a > specific setting: I believe since Pike version the default for > "create_volume" changed to "true" when launching instances from > Horizon dashboard. I would like to change that to "false" and set it > in our custom > /srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.d/_100_local_settings.py: > > > LAUNCH_INSTANCE_DEFAULTS = { > 'config_drive': False, > 'create_volume': False, > 'hide_create_volume': False, > 'disable_image': False, > 'disable_instance_snapshot': False, > 'disable_volume': False, > 'disable_volume_snapshot': False, > 'enable_scheduler_hints': True, > } > > Other configs from this file work as expected, so that custom file > can't be the reason. > After apache and memcached restart nothing changes, the default is > still "true". Can anyone shed some light, please? I haven't tried > other configs yet so I can't tell if more options are affected. > > Thanks! > Eugen From mark at stackhpc.com Tue Aug 25 07:42:18 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Aug 2020 08:42:18 +0100 Subject: [Monasca] Kolla docker image on Docker Hub In-Reply-To: References: Message-ID: On Tue, 25 Aug 2020 at 08:10, Radosław Piliszek wrote: > > Hi Tony, > > RDO does not package Monasca so it does not exist in the binary > flavour (except for dedicated Grafana). > > Please consult [1]. > Your immediate workaround is to use source flavour for Monasca. > > [1] https://docs.openstack.org/kolla/ussuri/support_matrix.html > > -yoctozepto > > On Tue, Aug 25, 2020 at 8:11 AM Tony Liu wrote: > > > > Hi, > > > > Are those Monasca Kolla container images > > kolla/centos-binary-monasca-* on Docker Hub? > > I only see kolla/centos-binary-monasca-grafana. > > > > I am running Kolla Ansible to deploy Monasca and got this failure. > > ======== > > docker.errors.ImageNotFound: 404 Client Error: Not Found (\"pull access denied for kolla/centos-binary-monasca-api, repository does not exist or may require \\'docker login\\': denied: requested access to the resource is denied\") > > ======== Please follow the kolla documentation for deploying monasca, which includes forcing the use of source images: https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/monasca-guide.html. > > > > Thanks! > > Tony > > > > > From dtantsur at redhat.com Tue Aug 25 07:46:42 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 25 Aug 2020 09:46:42 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> References: <0800d06e870cc5370ada0a85c5e4aaf3b329107d.camel@redhat.com> Message-ID: On Mon, Aug 24, 2020 at 1:52 PM Sean Mooney wrote: > On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote: > > Hi, > > > > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck > > wrote: > > > > > Hi! > > > > > > CERN's deployment is using the iscsi deploy interface since we started > > > with Ironic a couple of years ago (and we installed around 5000 nodes > > > with it by now). The reason we chose it at the time was simplicity: we > > > did not (and still do not) have a Swift backend to Glance, and the > iscsi > > > interface provided a straightforward alternative. > > > > > > While we have not seen obscure bugs/issues with it, I can certainly > back > > > the scalability issues mentioned by Dmitry: the tunneling of the images > > > through the controllers can create issues when deploying hundreds of > > > nodes at the same time. The security of the iscsi interface is less of > a > > > concern in our specific environment. > > > > > > So, why did we not move to direct (yet)? In addition to the lack of > > > Swift, mostly since iscsi works for us and the scalability issues were > > > not that much of a burning problem ... so we focused on other things :) > > > > > > Here are some thoughts/suggestions for this discussion: > > > > > > How would 'direct' work with other Glance backends (like Ceph/RBD in > our > > > case)? If using direct requires to duplicate images from Glance to > > > Ironic (or somewhere else) to be served, I think this would be an > > > argument against deprecating iscsi. > > > > > > > With image_download_source=http ironic will download the image to the > > conductor to be able serve it to the node. Which is exactly what the > iscsi > > is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a > > means of serving the image). > > > > Would it be an option for you to test direct deploy with > > image_download_source=http? > i think if there is still an option to not force deployemnt to altere any > of there > other sevices this is likely ok but i think the onious shoudl be on the > ironic > and ooo teams to ensure there is an upgrade path for those useres before > this deprecation > becomes a removal without deploying swift or a swift compatibale api e.g. > RadosGW > > perhaps a ci job could be put in place maybe using grenade that starts > with iscsi and moves > to direct with http porvided to show that just setting that weill allow > the conductor to download > the image from glance and server it to the ipa. > This is the CI job with direct deploy in a low RAM environment with a large image (CentOS) without Swift: https://zuul.opendev.org/t/openstack/build/58f623d90435470f9095eb68202c25f8 The change is https://review.opendev.org/#/c/747413/ Dmitry > > > unlike cern i just use ironic in a tiny home deployment where i have an > all in one deployment + 4 addtional > nodes for ironic. i cant deploy swift as all my disks are already in use > for cinder so down the line when > i eventually upgrade to vicortia and wallaby i would either have to drop > ironic or not upgrade it > if there is not a option to just pull the image from glance or glance via > the conductor. enhancing the ipa > to pull directly from glance would also proably work for many who use > iscsi today but that would depend on your network > toplogy i guess. > > > > > > > > > > Equally, if this would require to completely move the Glance backend to > > > something else, like from RBD to RadosGW, I would not expect happy > > > operators. (Does anyone know if RadosGW could even replace Swift for > > > this specific use case?) > > > > > > > AFAIK ironic works with RadosGW, we have some support code for it. > > > > > > > > > > Do we have numbers on how many deployments use iscsi vs direct? If many > > > rely on iscsi, I would also suggest to establish a migration guide for > > > operators on how to move from iscsi to direct, for the various configs. > > > Recent versions of Glance support multiple backends, so a migration > path > > > may be to add a new (direct compatible) backend for new images. > > > > > > > I don't have any numbers, but a migration guide is a must in any case. > > > > I expect most TripleO consumers to use the iscsi deploy, but only because > > it's the default. Their Edge solution uses the direct deploy. I've > polled a > > few operators I know, they all (except for you, obviously :) seem to use > > the direct deploy. Metal3 uses direct deploy. > > > > Dmitry > > > > > > > > > > Cheers, > > > Arne > > > > > > On 20.08.20 17:49, Julia Kreger wrote: > > > > I'm having a sense of deja vu! > > > > > > > > Because of the way the mechanics work, the iscsi deploy driver is in > > > > an unfortunate position of being harder to troubleshoot and diagnose > > > > failures. Which basically means we've not been able to really > identify > > > > common failures and add logic to handle them appropriately, like we > > > > are able to with a tcp socket and file download. Based on this alone, > > > > I think it makes a solid case for us to seriously consider > > > > deprecation. > > > > > > > > Overall, I'm +1 for the proposal and I believe over two cycles is the > > > > right way to go. > > > > > > > > I suspect we're going to have lots of push back from the TripleO > > > > community because there has been resistance to change their default > > > > usage in the past. As such I'm adding them to the subject so > hopefully > > > > they will be at least aware. > > > > > > > > I guess my other worry is operators who already have a substantial > > > > operational infrastructure investment built around the iscsi deploy > > > > interface. I wonder why they didn't use direct, but maybe they have > > > > all migrated in the past ?5? years. This could just be a non-concern > > > > in reality, I'm just not sure. > > > > > > > > Of course, if someone is willing to step up and make the iscsi > > > > deployment interface their primary focus, that also shifts the > > > > discussion to making direct the default interface? > > > > > > > > -Julia > > > > > > > > > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > > > > > > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > Side note for those lacking context: this proposal concerns > deprecating > > > > > > one of the ironic deploy interfaces detailed in > > > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. > It > > > does not affect the boot-from-iSCSI feature. > > > > > > > > > > I would like to propose deprecating and removing the 'iscsi' deploy > > > > > > interface over the course of the next 2 cycles. The reasons are: > > > > > 1) The iSCSI deploy is a source of occasional cryptic bugs when a > > > > > > target cannot be discovered or mounted properly. > > > > > 2) Its security is questionable: I don't think we even use > > > > > > authentication. > > > > > 3) Operators confusion: right now we default to the iSCSI deploy > but > > > > > > pretty much direct everyone who cares about scalability or security to > the > > > 'direct' deploy. > > > > > 4) Cost of maintenance: our feature set is growing, our team - not > so > > > > > > much. iscsi_deploy.py is 800 lines of code that can be removed, and > some > > > dependencies that can be dropped as well. > > > > > > > > > > As far as I can remember, we've kept the iSCSI deploy for two > reasons: > > > > > 1) The direct deploy used to require Glance with Swift backend. The > > > > > > recently added [agent]image_download_source option allows caching and > > > serving images via the ironic's HTTP server, eliminating this problem. > I > > > guess we'll have to switch to 'http' by default for this option to > keep the > > > out-of-box experience. > > > > > 2) Memory footprint of the direct deploy. With the raw images > streaming > > > > > > we no longer have to cache the downloaded images in the agent memory, > > > removing this problem as well (I'm not even sure how much of a problem > it > > > is in 2020, even my phone has 4GiB of RAM). > > > > > > > > > > If this proposal is accepted, I suggest to execute it as follows: > > > > > Victoria release: > > > > > 1) Put an early deprecation warning in the release notes. > > > > > 2) Announce the future change of the default value for > > > > > > [agent]image_download_source. > > > > > W release: > > > > > 3) Change [agent]image_download_source to 'http' by default. > > > > > 4) Remove iscsi from the default enabled_deploy_interfaces and > move it > > > > > > to the back of the supported list (effectively making direct deploy the > > > default). > > > > > X release: > > > > > 5) Remove the iscsi deploy code from both ironic and IPA. > > > > > > > > > > Thoughts, opinions, suggestions? > > > > > > > > > > Dmitry > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Aug 25 07:55:08 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 25 Aug 2020 08:55:08 +0100 Subject: [Kolla Ansible] host maintenance In-Reply-To: References: <046E9C0290DD9149B106B72FC9156BEA04814569@gmsxchsvr01.thecreation.com> Message-ID: On Mon, 24 Aug 2020 at 19:50, Tony Liu wrote: > > > -----Original Message----- > > From: Mark Goddard > > Sent: Monday, August 24, 2020 11:21 AM > > To: Tony Liu > > Cc: Eric K. Miller ; openstack-discuss > > > > Subject: Re: [Kolla Ansible] host maintenance > > > > On Mon, 24 Aug 2020 at 17:53, Tony Liu wrote: > > > > > > > -----Original Message----- > > > > From: Mark Goddard > > > > Sent: Monday, August 24, 2020 12:46 AM > > > > To: Eric K. Miller > > > > Cc: openstack-discuss > > > > Subject: Re: [Kolla Ansible] host maintenance > > > > > > > > On Sat, 22 Aug 2020 at 01:10, Eric K. Miller > > > > > > > > wrote: > > > > > > > > > > > Actually, in my case, the setup is originally deploy by Kolla > > > > > > Ansible. Other than the initial deployment, I am looking for > > > > > > using Kolla Ansible for maintenance operations. > > > > > > What I am looking for, eg. replace a host, can surely be done by > > > > > > manual steps or customized script. I'd like to know if they are > > > > > > automated by Kolla Ansible. > > > > > > > > > > We do this often by simply using the "limit" flag in Kolla Ansible > > > > > to > > > > only include the controllers and new compute node (after adding the > > > > compute node to the multinode.ini file). Specify "reconfigure" for > > > > the action, and not "install". > > > > > > > > We need some better docs around this, and I think they will be added > > > > soon. Some things to watch out for: > > > > > > > > * if adding a new controller, ensure that if using --limit, all > > > > controllers are included and do not use serial mode > > > > > > What I tried was to replace a controller, where I don't need to update > > > other controllers, because there is no address update. > > > > > > If there is address update caused by controller change, then all > > > controllers have to be included to get update. > > > > While this may work at the moment, we have just merged a change that > > prevents this. For keystone, we need access to all controllers, to > > determine whether it is a new cluster or a new node in an existing > > cluster. > > > > > > > > What's "serial mode"? > > > > Ansible has a feature to run plays in batches of some % of the hosts. > > In Kolla Ansible you can e.g. export ANSIBLE_SERIAL=0.1. It's an > > advanced use case and needs some care. > > > > > > > > > * if removing a controller, reconfigure other controllers to update > > > > the RabbitMQ & Galera cluster nodes etc. > > > > > > In this case, are those services who don't need any updates going to > > > be restarted or untouched? > > Could you comment on this? This is my biggest concern. I'd like > to ensure services who don't need update remain untouched. In general, Kolla Ansible will only restart containers if the config files or container configuration changes. There is a bug in Ansible which means that this isn't always true, e.g. if nova-api needs to restart, we may also restart nova-conductor on the same host. See https://bugs.launchpad.net/kolla-ansible/+bug/1863510 > > Thanks! > Tony > From radoslaw.piliszek at gmail.com Tue Aug 25 08:01:35 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 25 Aug 2020 10:01:35 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Hi Luis, I've added you. -yoctozepto On Tue, Aug 25, 2020 at 7:35 AM Luis Ramirez wrote: > +1 I'll try to do my best. Please add me to the core > > Br, > Luis Rmz > Blockchain, DevOps & Open Source Cloud Solutions Architect > ---------------------------------------- > Founder & CEO > OpenCloud.es > luis.ramirez at opencloud.es > Skype ID: d.overload > Hangouts: luis.ramirez at opencloud.es > [image: ] +34 911 950 123 / [image: ]+39 392 1289553 / [image: ]+49 > 152 26917722 / Česká republika: +420 774 274 882 > ----------------------------------------------------- > > > El mar., 25 ago. 2020 a las 6:52, Sam P () escribió: > >> Thank you all for volunteering to maintain the project. >> > Please let me know how we should proceed with the meetings. >> > I can start them on Tuesdays at 7 AM UTC. >> > And since the Masakari own channel is quite a peaceful one, I would >> > suggest to run them there directly. >> > What are your thoughts? :-) >> I think #openstack-masakari channel is all set to conduct the meeting. >> I am totally OK with that. And Tuesday at 7AM UTC also works for me. >> Previously we conducted the meeting every two weeks (on even weeks). >> How about others? >> Please add comments to the following review. >> https://review.opendev.org/#/c/747819/ >> >> --- Regards, >> Sampath >> >> On Tue, Aug 25, 2020 at 1:29 PM Sam P wrote: >> > >> > Hi All, >> > >> > I add the following members to the core team. >> > >> > Fabian Zimmermann dev.faz at googlemail.com >> > Jegor van Opdorpjegor at greenedge.cloud >> > Radosław Piliszekradoslaw.piliszek at gmail.com >> > suzhengweisugar-2008 at 163.com >> > >> > Please let me or other core members know if any one else would like to >> > join the core team. >> > --- Regards, >> > Sampath >> > >> > On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann >> wrote: >> > > >> > > Hi, >> > > >> > > As long as there are enough cores to keep the project running >> everything is fine :) >> > > >> > > Fabian >> > > >> > > Jean-Philippe Evrard schrieb am Fr., 21. >> Aug. 2020, 16:32: >> > >> >> > >> >> > >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: >> > >> > Hi, >> > >> > >> > >> > if nobody complains I also would like to request core status to >> help getting the project further. >> > >> > >> > >> > Fabian Zimmermann >> > >> >> > >> Let's hope this will not be lost in the list :) >> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Aug 25 08:03:32 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 25 Aug 2020 10:03:32 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Hi Sampath, Thanks for handling this. I'll sit down to clean up the queue a bit and ask other new cores to co-review and merge a few waiting patches. -yoctozepto On Tue, Aug 25, 2020 at 6:40 AM Sam P wrote: > > Hi All, > > I add the following members to the core team. > > Fabian Zimmermann dev.faz at googlemail.com > Jegor van Opdorpjegor at greenedge.cloud > Radosław Piliszekradoslaw.piliszek at gmail.com > suzhengweisugar-2008 at 163.com > > Please let me or other core members know if any one else would like to > join the core team. > --- Regards, > Sampath > > On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann wrote: > > > > Hi, > > > > As long as there are enough cores to keep the project running everything is fine :) > > > > Fabian > > > > Jean-Philippe Evrard schrieb am Fr., 21. Aug. 2020, 16:32: > >> > >> > >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > >> > Hi, > >> > > >> > if nobody complains I also would like to request core status to help getting the project further. > >> > > >> > Fabian Zimmermann > >> > >> Let's hope this will not be lost in the list :) > >> > From radoslaw.piliszek at gmail.com Tue Aug 25 08:08:52 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 25 Aug 2020 10:08:52 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: Hi New Cores, Please join #openstack-masakari on Freenode and let me know your IRC nicknames so that we can recognize each other. I probably know some of your nicks already but it's best to refresh. :-) The string in my message signature is my IRC nick in case you were wondering what spell that is. :-) -yoctozepto On Tue, Aug 25, 2020 at 6:49 AM Sam P wrote: > > Thank you all for volunteering to maintain the project. > > Please let me know how we should proceed with the meetings. > > I can start them on Tuesdays at 7 AM UTC. > > And since the Masakari own channel is quite a peaceful one, I would > > suggest to run them there directly. > > What are your thoughts? :-) > I think #openstack-masakari channel is all set to conduct the meeting. > I am totally OK with that. And Tuesday at 7AM UTC also works for me. > Previously we conducted the meeting every two weeks (on even weeks). > How about others? > Please add comments to the following review. > https://review.opendev.org/#/c/747819/ > > --- Regards, > Sampath > > On Tue, Aug 25, 2020 at 1:29 PM Sam P wrote: > > > > Hi All, > > > > I add the following members to the core team. > > > > Fabian Zimmermann dev.faz at googlemail.com > > Jegor van Opdorpjegor at greenedge.cloud > > Radosław Piliszekradoslaw.piliszek at gmail.com > > suzhengweisugar-2008 at 163.com > > > > Please let me or other core members know if any one else would like to > > join the core team. > > --- Regards, > > Sampath > > > > On Sat, Aug 22, 2020 at 2:08 AM Fabian Zimmermann wrote: > > > > > > Hi, > > > > > > As long as there are enough cores to keep the project running everything is fine :) > > > > > > Fabian > > > > > > Jean-Philippe Evrard schrieb am Fr., 21. Aug. 2020, 16:32: > > >> > > >> > > >> On Wed, Aug 19, 2020, at 06:23, Fabian Zimmermann wrote: > > >> > Hi, > > >> > > > >> > if nobody complains I also would like to request core status to help getting the project further. > > >> > > > >> > Fabian Zimmermann > > >> > > >> Let's hope this will not be lost in the list :) > > >> > From zapiec at gonicus.de Tue Aug 25 08:44:16 2020 From: zapiec at gonicus.de (Benjamin Zapiec) Date: Tue, 25 Aug 2020 10:44:16 +0200 Subject: Scaling control nodes Message-ID: <23e0e705-1446-dc32-74d2-5959fdba6368@gonicus.de> Hello everyone, while trying openstack i referred to the red hat installation documentation which is okay but lead to one question. It looks like there is no problem in scaling compute nodes if you run out of resources. But scaling the controller nodes is not supported by red hat. Since I'm using the official tripleo openstack version and not the red hat version i was wondering if this is not supported by the openstack project. Having in mind that red hat doesn't support this i was looking for something that tells me that it is supported (or not) by the tripleo openstack project. But i didn't found anything explicit. So may you tell me if it is possible to scale up Controller Nodes? And if not which component is not scalable by tripleo? Is it possible to create an controller profile that is scalable? Best regards -- Benjamin Zapiec (System Engineer) * GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg * Tel.: +49 2932 916-0 * Fax: +49 2932 916-245 * http://www.GONICUS.de * Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg * Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder * Vorsitzender des Beirats: Juergen Michels * Amtsgericht Arnsberg * HRB 1968 Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der Artikel 13 und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter: https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre formlose Anfrage. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From sandeep.ee.nagendra at gmail.com Tue Aug 25 09:34:41 2020 From: sandeep.ee.nagendra at gmail.com (sandeep) Date: Tue, 25 Aug 2020 15:04:41 +0530 Subject: [cliff] [dev] Cliff auto completion not working inside interactive mode Message-ID: Hi Team, In my system, I am trying auto completion for my CLI application. CLIFF version - cliff==3.4.0 Auto complete works fine on bash prompt. But inside the interactive shell, auto complete does not work. Below is the output for the help command inside the interactive shell. (appcli) help Miscellaneous help topics: ========================== help Application commands (type help ): ========================================= complete snapshot list reports service restart service-object-type app2 service restart service-object-type app3 service restart service-object-type app4 service show state service-object-type app1 service show state service-object-type app2 service show state service-object-type app3 service show state service-object-type app4 swm rollback node swm cancel sw-update swm downgrade node swm list sw-info swm show sw-info swm start sw-update file swm start sw-downgrade help Now, if I type swm and press tab, it lists all the sub commands under it. (appcli) swm cancel sw-update list sw-info start sw-update file downgrade node rollback node show sw-info start sw-downgrade But if i type, (appcli) swm s gives below output, (appcli) swm "s It stops at this point and further pressing tab does not autocomplete. Could you please let me know what could be the problem? Is this a known issue? or Am i missing something? Thanks, Sandeep -------------- next part -------------- An HTML attachment was scrubbed... URL: From hjensas at redhat.com Tue Aug 25 10:35:47 2020 From: hjensas at redhat.com (Harald Jensas) Date: Tue, 25 Aug 2020 12:35:47 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: On 8/20/20 5:49 PM, Julia Kreger wrote: > I suspect we're going to have lots of push back from the TripleO > community because there has been resistance to change their default > usage in the past. As such I'm adding them to the subject so hopefully > they will be at least aware. Since TripleO already support using the direct interface, it's recommended and tested by the TripleO group focusing on edge type deployments, switching to direct by default might not be too much of a hassle for TripleO. We may want to change the disk-image format used by TripleO to raw as well, to benefit from the raw image streaming capabilities? Or would enabling image_download_source = http convert the images as they are cached on conductors? (see question inline below.) > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur wrote: >> >> Hi all, >> >> Side note for those lacking context: this proposal concerns deprecating one of the ironic deploy interfaces detailed in https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It does not affect the boot-from-iSCSI feature. >> >> I would like to propose deprecating and removing the 'iscsi' deploy interface over the course of the next 2 cycles. The reasons are: >> 1) The iSCSI deploy is a source of occasional cryptic bugs when a target cannot be discovered or mounted properly. >> 2) Its security is questionable: I don't think we even use authentication. >> 3) Operators confusion: right now we default to the iSCSI deploy but pretty much direct everyone who cares about scalability or security to the 'direct' deploy. >> 4) Cost of maintenance: our feature set is growing, our team - not so much. iscsi_deploy.py is 800 lines of code that can be removed, and some dependencies that can be dropped as well. >> >> As far as I can remember, we've kept the iSCSI deploy for two reasons: >> 1) The direct deploy used to require Glance with Swift backend. The recently added [agent]image_download_source option allows caching and serving images via the ironic's HTTP server, eliminating this problem. I guess we'll have to switch to 'http' by default for this option to keep the out-of-box experience. >> 2) Memory footprint of the direct deploy. With the raw images streaming we no longer have to cache the downloaded images in the agent memory, removing this problem as well (I'm not even sure how much of a problem it is in 2020, even my phone has 4GiB of RAM). >> When using image_download_source = http, does Ironic convert non-raw images when they are placed on each conductors cache? To benefit from the raw image streaming? >> If this proposal is accepted, I suggest to execute it as follows: >> Victoria release: >> 1) Put an early deprecation warning in the release notes. >> 2) Announce the future change of the default value for [agent]image_download_source. >> W release: >> 3) Change [agent]image_download_source to 'http' by default. >> 4) Remove iscsi from the default enabled_deploy_interfaces and move it to the back of the supported list (effectively making direct deploy the default). >> X release: >> 5) Remove the iscsi deploy code from both ironic and IPA. >> >> Thoughts, opinions, suggestions? >> >> Dmitry > From dtantsur at redhat.com Tue Aug 25 10:59:47 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 25 Aug 2020 12:59:47 +0200 Subject: [ironic][tripleo] RFC: deprecate the iSCSI deploy interface? In-Reply-To: References: Message-ID: On Tue, Aug 25, 2020 at 12:39 PM Harald Jensas wrote: > On 8/20/20 5:49 PM, Julia Kreger wrote: > > I suspect we're going to have lots of push back from the TripleO > > community because there has been resistance to change their default > > usage in the past. As such I'm adding them to the subject so hopefully > > they will be at least aware. > > Since TripleO already support using the direct interface, it's > recommended and tested by the TripleO group focusing on edge type > deployments, switching to direct by default might not be too much of a > hassle for TripleO. > ++ > > We may want to change the disk-image format used by TripleO to raw as > well, to benefit from the raw image streaming capabilities? Or would > enabling image_download_source = http convert the images as they are > cached on conductors? (see question inline below.) > > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur > wrote: > >> > >> Hi all, > >> > >> Side note for those lacking context: this proposal concerns deprecating > one of the ironic deploy interfaces detailed in > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It > does not affect the boot-from-iSCSI feature. > >> > >> I would like to propose deprecating and removing the 'iscsi' deploy > interface over the course of the next 2 cycles. The reasons are: > >> 1) The iSCSI deploy is a source of occasional cryptic bugs when a > target cannot be discovered or mounted properly. > >> 2) Its security is questionable: I don't think we even use > authentication. > >> 3) Operators confusion: right now we default to the iSCSI deploy but > pretty much direct everyone who cares about scalability or security to the > 'direct' deploy. > >> 4) Cost of maintenance: our feature set is growing, our team - not so > much. iscsi_deploy.py is 800 lines of code that can be removed, and some > dependencies that can be dropped as well. > >> > >> As far as I can remember, we've kept the iSCSI deploy for two reasons: > >> 1) The direct deploy used to require Glance with Swift backend. The > recently added [agent]image_download_source option allows caching and > serving images via the ironic's HTTP server, eliminating this problem. I > guess we'll have to switch to 'http' by default for this option to keep the > out-of-box experience. > >> 2) Memory footprint of the direct deploy. With the raw images streaming > we no longer have to cache the downloaded images in the agent memory, > removing this problem as well (I'm not even sure how much of a problem it > is in 2020, even my phone has 4GiB of RAM). > >> > > When using image_download_source = http, does Ironic convert non-raw > images when they are placed on each conductors cache? To benefit from > the raw image streaming? > Yes, unless it's explicitly disabled. Although storing raw images from the beginning may make deployments a bit faster and save some disk space for this conversion. Dmitry > > >> If this proposal is accepted, I suggest to execute it as follows: > >> Victoria release: > >> 1) Put an early deprecation warning in the release notes. > >> 2) Announce the future change of the default value for > [agent]image_download_source. > >> W release: > >> 3) Change [agent]image_download_source to 'http' by default. > >> 4) Remove iscsi from the default enabled_deploy_interfaces and move it > to the back of the supported list (effectively making direct deploy the > default). > >> X release: > >> 5) Remove the iscsi deploy code from both ironic and IPA. > >> > >> Thoughts, opinions, suggestions? > >> > >> Dmitry > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masayuki.igawa at gmail.com Tue Aug 25 11:06:54 2020 From: masayuki.igawa at gmail.com (Masayuki Igawa) Date: Tue, 25 Aug 2020 20:06:54 +0900 Subject: [qa] Wallaby PTG planning Message-ID: Hi, We need to start thinking about the next cycle already. As you probably know, next virtual PTG will be held in October 26-30[0]. I prepared an etherpad[1] to discuss and track our topics. So, please add your name if you are going to attend the PTG session. And also, please add your proposals of the topics which you want to discuss during the PTG. I also made a doodle[2] with possible time slots. Please put your best days and hours so that we can try to schedule and book our sessions in the time slots. [0] https://www.openstack.org/ptg/ [1] https://etherpad.opendev.org/p/qa-wallaby-ptg [2] https://doodle.com/poll/qqd7ayz3i4ubnsbb Best Regards, -- Masayuki Igawa Key fingerprint = C27C 2F00 3A2A 999A 903A 753D 290F 53ED C899 BF89 From radoslaw.piliszek at gmail.com Tue Aug 25 11:45:52 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 25 Aug 2020 13:45:52 +0200 Subject: [qa] Wallaby PTG planning In-Reply-To: References: Message-ID: Thanks, Masayuki. I added myself. I hope we can get it non-colliding with Kolla meetings this time. I'll try to do a better job at early collision detection. :-) -yoctozepto On Tue, Aug 25, 2020 at 1:16 PM Masayuki Igawa wrote: > > Hi, > > We need to start thinking about the next cycle already. > As you probably know, next virtual PTG will be held in October 26-30[0]. > > I prepared an etherpad[1] to discuss and track our topics. So, please add > your name if you are going to attend the PTG session. And also, please add > your proposals of the topics which you want to discuss during the PTG. > > I also made a doodle[2] with possible time slots. Please put your best days and hours > so that we can try to schedule and book our sessions in the time slots. > > [0] https://www.openstack.org/ptg/ > [1] https://etherpad.opendev.org/p/qa-wallaby-ptg > [2] https://doodle.com/poll/qqd7ayz3i4ubnsbb > > Best Regards, > -- Masayuki Igawa > Key fingerprint = C27C 2F00 3A2A 999A 903A 753D 290F 53ED C899 BF89 > From mnaser at vexxhost.com Tue Aug 25 13:23:27 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 25 Aug 2020 09:23:27 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Add assert:supports-standalone https://review.opendev.org/722399 - Add etcd3gw to Oslo https://review.opendev.org/747188 - Update and simplify comparison of working groups https://review.opendev.org/746763 - Drop requirement of 1/3 positive TC votes to land https://review.opendev.org/746711 - Resolution to define distributed leadership for projects https://review.opendev.org/744995 - Move towards dual office hours in diff TZ https://review.opendev.org/746167 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Drop all exceptions for legacy validation https://review.opendev.org/745403 - Move towards single office hour https://review.opendev.org/745200 ## General Changes - Fix names inside check-review-status https://review.opendev.org/745913 # Email Threads - Zuul Native Jobs Goal Update #2: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016561.html - Masakari Project Aliveness: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016520.html - vPTG October 2020 Signup: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016497.html - OpenStack Client vs python-*clients: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016409.html Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From ts-takahashi at nec.com Tue Aug 25 05:40:52 2020 From: ts-takahashi at nec.com (=?utf-8?B?VEFLQUhBU0hJIFRPU0hJQUtJKOmrmOapi+OAgOaVj+aYjik=?=) Date: Tue, 25 Aug 2020 05:40:52 +0000 Subject: [tacker] IRC meeting In-Reply-To: <636004ca-130b-58ee-c769-19169926fcee@gmail.com> References: <636004ca-130b-58ee-c769-19169926fcee@gmail.com> Message-ID: Hi Yasufumi and Tacker team, Can I host the meeting? I have 1 topic, feedback from NFV-TST. Regards, Toshiaki -------------------------------------------------  Toshiaki Takahashi  E-mail: ts-takahashi at nec.com ------------------------------------------------- > -----Original Message----- > From: Yasufumi Ogawa > Sent: Tuesday, August 25, 2020 2:28 PM > To: openstack-discuss > Subject: [tacker] IRC meeting > > Hi tacker team, > > I am not available to join IRC meeting today unfortunately. I would like to > suggest to anyone host the meeting, or skip it if no items. > > Thanks, > Yasufumi From sandeep.ee.nagendra at gmail.com Tue Aug 25 06:15:41 2020 From: sandeep.ee.nagendra at gmail.com (sandeep) Date: Tue, 25 Aug 2020 11:45:41 +0530 Subject: [Cliff] [dev] auto completion not working inside interactive mode In-Reply-To: References: Message-ID: Hi Team, In my system, I am trying auto completion for my CLI application. *CLIFF version - cliff==3.4.0* Auto complete works fine on bash prompt. But inside the interactive shell, auto complete does not work. Below is the screenshot for the help command inside the interactive shell. [image: image.png] Now, if I type swm and press tab, it lists all the sub commands under it. But, swm s gives swm "s and further command auto completion does not work. [image: image.png] Could you please let me know what could be the problem? Is this a known issue? or Am i missing something? Thanks, Sandeep -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 28735 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7345 bytes Desc: not available URL: From amy at demarco.com Tue Aug 25 05:28:33 2020 From: amy at demarco.com (Amy Marrich) Date: Tue, 25 Aug 2020 00:28:33 -0500 Subject: [openstack-community] Error add member to pool ( OCTAVIA ) when using SSL to verify In-Reply-To: <692B1576-9AB1-46F9-9328-0D510DDCEE01@hxcore.ol> References: <692B1576-9AB1-46F9-9328-0D510DDCEE01@hxcore.ol> Message-ID: <59EC5E93-FC3F-4EDC-A874-9A2F466B37DC@demarco.com> Adding the OpenStack discuss list. Amy (spotz) > On Aug 24, 2020, at 11:14 PM, Vinh Nguyen Duc wrote: > >  > Dear Openstack community, > > My name is Duc Vinh, I am newer in Openstack > I am deploy Openstack Ussuri on Centos8 , I am using three nodes controller with High Availability topology and using HAproxy to verify cert for connect HTTPS, > I have trouble with project Octavia, I cannot add member in a pool after created Loadbalancer, listener, pool ( everything is fine). > Here is my log and configuration file: > > LOGS: > > 2020-08-25 10:55:42.872 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension security-group found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2020-08-25 10:55:42.892 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension dns-integration is not enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:70 > 2020-08-25 10:55:42.911 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension qos found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2020-08-25 10:55:42.933 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension allowed-address-pairs found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2020-08-25 10:55:43.068 226250 WARNING keystoneauth.identity.generic.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Failed to discover available identity versions when contacting https://192.168.10.150:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Error retrieving subnet (subnet id: 035f3183-f469-415f-b536-b4a81364e814.: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base chunked=chunked) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._validate_conn(conn) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base conn.connect() > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 344, in connect > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base ssl_context=context) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 367, in ssl_wrap_socket > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return context.wrap_socket(sock) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 365, in wrap_socket > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base _context=self, _session=session) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 776, in __init__ > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self.do_handshake() > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._sslobj.do_handshake() > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._sslobj.do_handshake() > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base timeout=timeout > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base _stacktrace=sys.exc_info()[2]) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise MaxRetryError(_pool, url, error or ResponseError(cause)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1004, in _send_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = self.session.request(method, url, **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 533, in request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = self.send(prep, **send_kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 646, in send > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base r = adapter.send(request, **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 514, in send > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise SSLError(e, request=request) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base requests.exceptions.SSLError: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=False) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=authenticated) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base disc = Discover(session, url, authenticated=authenticated) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 536, in __init__ > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=authenticated) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 102, in get_version_data > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = session.get(url, headers=headers, authenticated=authenticated) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.request(url, 'GET', **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 913, in request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = send(**kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1008, in _send_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise exceptions.SSLError(msg) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py", line 193, in _get_resource > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resource_type)(resource_id) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 869, in show_subnet > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.get(self.subnet_path % (subnet), params=_params) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 354, in get > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers, params=params) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 331, in retry_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers, params=params) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 282, in do_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 339, in do_request > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._check_uri_length(url) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 332, in _check_uri_length > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base uri_len = len(self.endpoint_url) + len(url) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 346, in endpoint_url > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.get_endpoint() > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 282, in get_endpoint > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.session.get_endpoint(auth or self.auth, **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1225, in get_endpoint > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return auth.get_endpoint(self, **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 380, in get_endpoint > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base allow_version_hack=allow_version_hack, **kwargs) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 271, in get_endpoint_data > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base service_catalog = self.get_access(session).service_catalog > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self.auth_ref = self.get_auth_ref(session) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._plugin = self._do_create_plugin(session) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 161, in _do_create_plugin > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base 'auth_url is correct. %s' % e) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > 2020-08-25 10:55:43.074 226250 DEBUG wsme.api [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Client-side error: Subnet 035f3183-f469-415f-b536-b4a81364e814 not found. format_exception /usr/lib/python3.6/site-packages/wsme/api.py:222 > 2020-08-25 10:55:43.076 226250 DEBUG octavia.common.keystone [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Request path is / and it does not require keystone authentication process_request /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > 2020-08-25 10:55:43.080 226250 DEBUG octavia.common.keystone [req-5091d326-0cb4-4ae1-bf4b-9ef6b9313dca - - - - -] Request path is / and it does not require keystone authentication process_request /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > > Configuration: > [root at controller01 ~]# cat /etc/octavia/octavia.conf > [DEFAULT] > > log_dir = /var/log/octavia > debug = True > transport_url = rabbit://openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > [api_settings] > api_base_uri = https://192.168.10.150:9876 > bind_host = 192.168.10.178 > bind_port = 9876 > auth_strategy = keystone > healthcheck_enabled = True > allow_tls_terminated_listeners = True > > [database] > connection = mysql+pymysql://octavia:FUkbii8AY4G6H9LxbJ2RRlOzHN61X8PI8FrMcuXQ at 192.168.10.150/octavia > max_retries = -1 > > [health_manager] > bind_port = 5555 > bind_ip = 192.168.10.178 > controller_ip_port_list = 192.168.10.178:5555, 192.168.10.179:5555, 192.168.10.28:5555 > heartbeat_key = insecure > > [keystone_authtoken] > service_token_roles_required = True > www_authenticate_uri = https://192.168.10.150:5000 > auth_url = https://192.168.10.150:5000 > region_name = Hanoi > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211,192.168.10.28:11211 > auth_type = password > project_domain_name = Default > user_domain_name = Default > project_name = service > username = octavia > password = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > cafile = /etc/ssl/private/haproxy.pem > insecure = false > > > [certificates] > cert_generator = local_cert_generator > #server_certs_key_passphrase = insecure-key-do-not-use-this-key > ca_private_key_passphrase = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > ca_private_key = /etc/octavia/certs/server_ca.key.pem > ca_certificate = /etc/octavia/certs/server_ca.cert.pem > region_name = Hanoi > ca_certificates_file = /etc/ssl/private/haproxy.pem > endpoint_type = internal > > [networking] > #allow_vip_network_id = True > #allow_vip_subnet_id = True > #allow_vip_port_id = True > > [haproxy_amphora] > #bind_port = 9443 > server_ca = /etc/octavia/certs/server_ca.cert.pem > client_cert = /etc/octavia/certs/client.cert-and-key.pem > base_path = /var/lib/octavia > base_cert_dir = /var/lib/octavia/certs > connection_max_retries = 1500 > connection_retry_interval = 1 > > [controller_worker] > amp_image_tag = amphora > amp_ssh_key_name = octavia > amp_secgroup_list = 80f44b73-dc9f-48aa-a0b8-8b78e5c6585c > amp_boot_network_list = 04425cb2-5963-48f5-a229-b89b7c6036bd > amp_flavor_id = 200 > network_driver = allowed_address_pairs_driver > compute_driver = compute_nova_driver > amphora_driver = amphora_haproxy_rest_driver > client_ca = /etc/octavia/certs/client_ca.cert.pem > loadbalancer_topology = SINGLE > amp_active_retries = 9999 > > [task_flow] > [oslo_messaging] > topic = octavia_prov > rpc_thread_pool_size = 2 > > [house_keeping] > [amphora_agent] > [keepalived_vrrp] > > [service_auth] > auth_url = https://192.168.10.150:5000 > auth_type = password > project_domain_name = default > user_domain_name = default > project_name = admin > username = admin > password = F35sXAYW5qDlMGfQbhmexIx12DqrQdpw6ixAseTd > cafile = /etc/ssl/private/haproxy.pem > region_name = Hanoi > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211,192.168.10.28:11211 > #insecure = true > > > [glance] > ca_certificates_file = /etc/ssl/private/haproxy.pem > region_name = Hanoi > endpoint_type = internal > insecure = false > > [neutron] > ca_certificates_file = /etc/ssl/private/haproxy.pem > region_name = Hanoi > endpoint_type = internal > insecure = false > > [cinder] > ca_certificates_file = /etc/ssl/private/haproxy.pem > region_name = Hanoi > endpoint_type = internal > insecure = false > > [nova] > ca_certificates_file = /etc/ssl/private/haproxy.pem > region_name = Hanoi > endpoint_type = internal > insecure = false > > [oslo_policy] > #policy_file = /etc/octavia/policy.json > > [oslo_messaging_notifications] > transport_url = rabbit://openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community -------------- next part -------------- An HTML attachment was scrubbed... URL: From harishkumarivaturi at gmail.com Tue Aug 25 12:30:27 2020 From: harishkumarivaturi at gmail.com (HARISH KUMAR Ivaturi) Date: Tue, 25 Aug 2020 14:30:27 +0200 Subject: Openstack with Nginx Support Message-ID: Hi I am Harish Kumar, Master Student at BTH, Karlskrona, Sweden. I am working on my Master thesis at BTH and my thesis topic is Performance evaluation of OpenStack with HTTP/3. I have successfully built curl and nginx with HTTP/3 support and I am performing some commands using curl for generating tokens so i could access the services of OpenStack. OpenStack relies with the Apache web server and I could not get any results using Nginx HTTP/3 . I would like to ask if there is any official documentation on OpenStack relying with Nginx?, I have searched in the internet reg. this info but could not get any, I would like to use nginx instead of apache web server , so I could get some results by performing curl and commands and nginx web server (with http/3 support). Please let me know and if there is any content please share with me. I hope you have understood this. It would be helpful for my Master Thesis. BR Harish Kumar -------------- next part -------------- An HTML attachment was scrubbed... URL: From wbedyk at suse.de Tue Aug 25 12:56:51 2020 From: wbedyk at suse.de (Witek Bedyk) Date: Tue, 25 Aug 2020 14:56:51 +0200 Subject: [monasca] Deprecate monasca-transform repository Message-ID: Hello, this message is to announce the deprecation of openstack/monasca-transform repository. The project will not accept new development on master branch but accept fixes on stable branches. It will follow the process described in Project Team Guide [1]. Please reply to this message until Sept. 7 if you would like to take over the development and maintenance of this repository. Thanks Witek [1] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository From rosmaita.fossdev at gmail.com Tue Aug 25 14:16:09 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 25 Aug 2020 10:16:09 -0400 Subject: [all][infra] READMEs of zuul roles not rendered properly - missing content In-Reply-To: References: <20200824143618.7xdecj67m5jzwpkz@yuggoth.org> Message-ID: <14978702-3919-943f-2750-3ecae1201a68@gmail.com> On 8/24/20 11:05 AM, Clark Boylan wrote: > On Mon, Aug 24, 2020, at 7:36 AM, Jeremy Stanley wrote: >> On 2020-08-24 16:12:17 +0200 (+0200), Martin Kopec wrote: >>> I've noticed that READMEs of zuul roles within openstack projects >>> are not rendered properly on opendev.org - ".. zuul:rolevar::" >>> syntax seems to be the problem. Although it's rendered well on >>> github.com, see f.e. [1] [2]. [snip] >> To be entirely honest, I wish Gitea didn't automatically attempt to >> render RST files, that makes it harder to actually refer to the >> source code for them, and it's a source code browser not a CMS for >> publishing documentation, but apparently this is a feature many >> other users do like for some reason. > > We can change this behavior by removing the external renderer (though I expect we're in the minority of preferring ability to link to the source here). This may be a bigger minority that you think ... I put up a patch to change the default behavior to not render RST, so anyone with a strong opinion, please comment on the patch: https://review.opendev.org/#/c/747796/ > > [3] https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea/templates/app.ini.j2#L88-L95 > [4] https://opendev.org/opendev/system-config/src/branch/master/docker/gitea/Dockerfile#L92-L94 > >> -- >> Jeremy Stanley > From emilien at redhat.com Tue Aug 25 14:32:32 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 25 Aug 2020 10:32:32 -0400 Subject: [tripleo] no recheck please Message-ID: We're hitting the docker rate limits very badly right now and while our mitigation patch will land [1], please refrain from approving or recheck patches for now. I've cleared the gate and I'll take care of re-adding these patches into the gate when things will be stable again. [1] https://review.opendev.org/#/c/746993 Thanks for your understanding and your patience! -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From pramchan at yahoo.com Tue Aug 25 15:53:39 2020 From: pramchan at yahoo.com (prakash RAMCHANDRAN) Date: Tue, 25 Aug 2020 15:53:39 +0000 (UTC) Subject: Openstack with Nginx Support (HARISH KUMAR Ivaturi) In-Reply-To: References: Message-ID: <680450507.7321322.1598370819069@mail.yahoo.com> Harish, Note Horizon dashboard is based on Django framework over Apache. Thus logically it should work if you deploy Django over Nginx and please refer to link getting Django and once you have that rest should work as the Model, View, Controller (MVC)  take care of addressing the rest. I have not seen any Ngnix deployment of Open stack, but a single domain Open stack Controller  should be possible to deploy with Nginx. You can also reach out to Ngnix or F5 team to help you out, as this is a good  exercise for leveraging capability of Nginix for OpenSrack https://uwsgi-docs.readthedocs.io/en/latest/tutorials/Django_and_nginx.html ThanksPrakash ---------------------------------------------------------------------- Message: 1 Date: Tue, 25 Aug 2020 14:30:27 +0200 From: HARISH KUMAR Ivaturi To: openstack-discuss at lists.openstack.org Subject: Openstack with Nginx Support Message-ID:     Content-Type: text/plain; charset="utf-8" Hi I am Harish Kumar, Master Student at BTH, Karlskrona, Sweden. I am working on my Master thesis at BTH and my thesis topic is Performance evaluation of OpenStack with HTTP/3. I have successfully built curl and nginx with HTTP/3 support and I am performing some commands using curl for generating tokens so i could access the services of OpenStack. OpenStack relies with the Apache web server and I could not get any results using Nginx HTTP/3 . I would like to ask if there is any official documentation on OpenStack relying with Nginx?, I have searched in the internet reg. this info but could not get any, I would like to use nginx instead of apache web server , so I could get some results by performing curl and commands and nginx web server (with http/3 support). Please let me know and if there is any content please share with me. I hope you have understood this. It would be helpful for my Master Thesis. BR Harish Kumar -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Aug 25 16:23:49 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 25 Aug 2020 16:23:49 +0000 Subject: [OSSA-2020-006] Nova: Live migration fails to update persistent domain XML (CVE-2020-17376) Message-ID: <20200825162348.heaisepopqhmnfli@yuggoth.org> =================================================================== OSSA-2020-006: Live migration fails to update persistent domain XML =================================================================== :Date: August 25, 2020 :CVE: CVE-2020-17376 Affects ~~~~~~~ - Nova: <19.3.1, >=20.0.0 <20.3.1, ==21.0.0 Description ~~~~~~~~~~~ Tadayoshi Hosoya (NEC) and Lee Yarwood (Red Hat) reported a vulnerability in Nova live migration. By performing a soft reboot of an instance which has previously undergone live migration, a user may gain access to destination host devices that share the same paths as host devices previously referenced by the virtual machine on the source. This can include block devices that map to different Cinder volumes on the destination than the source. The risk is increased significantly in non-default configurations allowing untrusted users to initiate live migrations, so administrators may consider temporarily disabling this in policy if they cannot upgrade immediately. This only impacts deployments where users are allowed to perform soft reboots of server instances; it is recommended to disable soft reboots in policy (only allowing hard reboots) until the fix can be applied. Patches ~~~~~~~ - https://review.opendev.org/747978 (Pike) - https://review.opendev.org/747976 (Queens) - https://review.opendev.org/747975 (Rocky) - https://review.opendev.org/747974 (Stein) - https://review.opendev.org/747973 (Train) - https://review.opendev.org/747972 (Ussuri) - https://review.opendev.org/747969 (Victoria) Credits ~~~~~~~ - Tadayoshi Hosoya from NEC (CVE-2020-17376) - Lee Yarwood from Red Hat (CVE-2020-17376) References ~~~~~~~~~~ - https://launchpad.net/bugs/1890501 - http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-17376 Notes ~~~~~ - The stable/rocky, stable/queens, and stable/pike branches are under extended maintenance and will receive no new point releases, but patches for them are provided as a courtesy. -- Jeremy Stanley OpenStack Vulnerability Management Team -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From rfolco at redhat.com Tue Aug 25 23:22:16 2020 From: rfolco at redhat.com (Rafael Folco) Date: Tue, 25 Aug 2020 20:22:16 -0300 Subject: [tripleo] TripleO CI Summary: Unified Sprint 31 Message-ID: Greetings, The TripleO CI team has just completed **Unified Sprint 31** (July 31 thru Aug 20). The following is a summary of completed work during this sprint cycle*: - Continued building internal component and integration pipelines for rhos-16.2. - Added more jobs to the component and integration pipelines. - Completed promoter code and test scenarios to run on CentOS8/Python3. - Continued merging changes to switch to the new configuration engine in promoter code. - Merged all patches for CentOS-7 -> CentOS-8 stable/train upstream migration. - Design improvements to Tempest scenario manager are under review. - Python3 support on diskimage-builder and buildimage role in tripleo-ci repo is under review. - Ruck/Rover recorded notes [1]. The planned work for the next sprint extends the work started in the previous sprint and focuses on the following: - Downstream OSP 16.2 pipeline. - Next-gen promoter changes (new configuration engine). - Dependency pipeline design to early detect breakages in the OS. - New container naming prefix on Victoria/Master onwards. The Ruck and Rover for this sprint are Arx Cruz (arxcruz) and Amol Kahat (akahat). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes to be tracked in hackmd [2]. Thanks, rfolco *TripleO-CI team is now using an internal JIRA instance to track sprint work [1] https://hackmd.io/QnprH9-yRTi6uWlEfaahoQ [2] https://hackmd.io/FUalpr55TJuy28QLp2tLng -- Folco -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed Aug 26 02:06:54 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 25 Aug 2020 22:06:54 -0400 Subject: [tripleo] no recheck please In-Reply-To: References: Message-ID: We merged: https://review.opendev.org/747953 - Disable docker.io mirrors (makes direct calls against registry API instead of going through proxy via single public IP and hit rate limits) https://review.opendev.org/746993 - Use new modify_only_with_source (reduces number of hits against registry API) So for now it's safe to recheck / +2 +A patches again. Thanks for your patience On Tue, Aug 25, 2020 at 10:32 AM Emilien Macchi wrote: > We're hitting the docker rate limits very badly right now and while our > mitigation patch will land [1], please refrain from approving or recheck > patches for now. > I've cleared the gate and I'll take care of re-adding these patches into > the gate when things will be stable again. > > [1] https://review.opendev.org/#/c/746993 > > Thanks for your understanding and your patience! > -- > Emilien Macchi > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From Istvan.Szabo at agoda.com Wed Aug 26 06:57:33 2020 From: Istvan.Szabo at agoda.com (Szabo, Istvan (Agoda)) Date: Wed, 26 Aug 2020 06:57:33 +0000 Subject: DB Prune Message-ID: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> Hi, We have a cluster where the user continuously spawn and delete servers which makes the db even in compressed state 1.1GB. I'm sure it has a huge amount of trash because this is a cicd environment and the prod just uses 75MB. How is it possible to cleanup the db on a safe way, what should be the steps? Best regards, Istvan ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-samuel.le-stang at corp.ovh.com Wed Aug 26 07:12:23 2020 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Wed, 26 Aug 2020 09:12:23 +0200 Subject: DB Prune In-Reply-To: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> References: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> Message-ID: <20200826071223.dih7c7pbeat3sqah@corp.ovh.com> Hey, You may have a look at OSArchiver (OpenStack DB archiver) which is a tool we use at OVHCloud to archive our OpenStack databases. We open sourced it last year but this is not an official OpenStack tool. https://github.com/ovh/osarchiver -- PS Szabo, Istvan (Agoda) wrote on mer. [2020-août-26 06:57:33 +0000]: > Hi, > > We have a cluster where the user continuously spawn and delete servers which > makes the db even in compressed state 1.1GB. > I’m sure it has a huge amount of trash because this is a cicd environment and > the prod just uses 75MB. > How is it possible to cleanup the db on a safe way, what should be the steps? > > Best regards, > Istvan > > > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ > > This message is confidential and is for the sole use of the intended recipient > (s). It may also be privileged or otherwise protected by copyright or other > legal rules. If you have received it by mistake please let us know by reply > email and delete it from your system. It is prohibited to copy this message or > disclose its content to anyone. Any confidentiality or privilege is not waived > or lost by any mistaken delivery or unauthorized disclosure of the message. All > messages sent to and from Agoda may be monitored to ensure compliance with > company policies, to protect the company's interests and to remove potential > malware. Electronic messages may be intercepted, amended, lost or deleted, or > contain viruses. > -- Pierre-Samuel Le Stang From arne.wiebalck at cern.ch Wed Aug 26 08:30:56 2020 From: arne.wiebalck at cern.ch (Arne Wiebalck) Date: Wed, 26 Aug 2020 10:30:56 +0200 Subject: [baremetal-sig][ironic] Future work and regular meetings Message-ID: <4f6c5ffd-0929-f516-4299-f69892b1d434@cern.ch> Dear all, With the release of the bare metal white paper [0] the bare metal SIG has completed its first target and is now ready to tackle new challenges. A number of potential topics the SIG could work on were raised during the recent opendev events. The suggestions are summarised on the bare metal etherpad [1]. To select and organise the future work, we feel that it may be better to start with regular meetings, though: the current idea is once a month, for one hour, on zoom. Based on the experience with the ad-hoc meetings we had so far I have set up a doodle to pick the exact slot: https://doodle.com/poll/3hpypw73455t2g24 If interested, please respond by the end of this week. Equally, if you have additional suggestions for the next focus of the SIG, do not hesitate to add them to [1]. Thanks! Arne [0] https://www.openstack.org/use-cases/bare-metal/how-ironic-delivers-abstraction-and-automation-using-open-source-infrastructure [1] https://etherpad.opendev.org/p/bare-metal-sig From zhangbailin at inspur.com Wed Aug 26 08:33:44 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Wed, 26 Aug 2020 08:33:44 +0000 Subject: [nova] Add 'accel_uuids' parameter to rebuild() function of the virt driver Message-ID: Hi all. In Ussuri release we were completed the nova-cyborg-interaction feature, but there are some operations of instance were blocked [1], we will support evacuate/rebuild [2] and/or shelve/unshelve [2] instance with accelerator in Victoria release. In [2] we will add 'accel_uuids' parameter to the rebuild() method of virt driver and Ironic driver, in virt/driver [4] we are not implemented the rebuild() method, and the 'accel_uuids' will be ignored in virt/ironic/driver. [1] https://docs.openstack.org/api-guide/compute/accelerator-support.html [2] Cyborg evacuate/rebuild support https://review.opendev.org/#/c/715326 [3] Cyborg shelve/unshelve support https://review.opendev.org/#/c/729563 [4] https://github.com/openstack/nova/blob/master/nova/virt/driver.py#L285 [5] https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1669 brinzhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Wed Aug 26 08:38:00 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 26 Aug 2020 17:38:00 +0900 Subject: [horizon] default create_volume setting can't be changed In-Reply-To: <20200824141904.Horde.biUwyDcXRQDK2D0KW6vwbE1@webmail.nde.ag> References: <20200824141904.Horde.biUwyDcXRQDK2D0KW6vwbE1@webmail.nde.ag> Message-ID: Hi Eugen, I also noticed this and filed a bug report at https://bugs.launchpad.net/horizon/+bug/1892990. It was caused by a missing comma in REST_API_REQUIRED_SETTINGS in openstack_dashboard/defaults.py. It was fixed in the master this month. It affects stable/train and stable/ussuri branches. I proposed backports to ussuri and train respectively. Cloud you try the stable/train backport? https://review.opendev.org/#/q/I1eae4be4464f55a29d169403a70c958c3b8a308b Thanks, Akihiro Motoki (irc: amotoki) On Mon, Aug 24, 2020 at 11:21 PM Eugen Block wrote: > > Hi *, > > we recently upgraded from Ocata to Train and I'm struggling with a > specific setting: I believe since Pike version the default for > "create_volume" changed to "true" when launching instances from > Horizon dashboard. I would like to change that to "false" and set it > in our custom > /srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.d/_100_local_settings.py: > > > LAUNCH_INSTANCE_DEFAULTS = { > 'config_drive': False, > 'create_volume': False, > 'hide_create_volume': False, > 'disable_image': False, > 'disable_instance_snapshot': False, > 'disable_volume': False, > 'disable_volume_snapshot': False, > 'enable_scheduler_hints': True, > } > > Other configs from this file work as expected, so that custom file > can't be the reason. > After apache and memcached restart nothing changes, the default is > still "true". Can anyone shed some light, please? I haven't tried > other configs yet so I can't tell if more options are affected. > > Thanks! > Eugen > > From eblock at nde.ag Wed Aug 26 08:55:09 2020 From: eblock at nde.ag (Eugen Block) Date: Wed, 26 Aug 2020 08:55:09 +0000 Subject: [horizon] default create_volume setting can't be changed In-Reply-To: References: <20200824141904.Horde.biUwyDcXRQDK2D0KW6vwbE1@webmail.nde.ag> Message-ID: <20200826085509.Horde.q7sOlWnVkEVEkT1R3RLt0NM@webmail.nde.ag> Hi, thank you very much for the confirmation and the bug report. Setting the comma seems to do the trick, I reverted my own changes and only added the comma, after restarting apache the dashboard applied my settings. Thanks for the quick solution! Best regards, Eugen Zitat von Akihiro Motoki : > Hi Eugen, > > I also noticed this and filed a bug report at > https://bugs.launchpad.net/horizon/+bug/1892990. > It was caused by a missing comma in REST_API_REQUIRED_SETTINGS in > openstack_dashboard/defaults.py. > It was fixed in the master this month. It affects stable/train and > stable/ussuri branches. > > I proposed backports to ussuri and train respectively. > Cloud you try the stable/train backport? > https://review.opendev.org/#/q/I1eae4be4464f55a29d169403a70c958c3b8a308b > > Thanks, > Akihiro Motoki (irc: amotoki) > > On Mon, Aug 24, 2020 at 11:21 PM Eugen Block wrote: >> >> Hi *, >> >> we recently upgraded from Ocata to Train and I'm struggling with a >> specific setting: I believe since Pike version the default for >> "create_volume" changed to "true" when launching instances from >> Horizon dashboard. I would like to change that to "false" and set it >> in our custom >> /srv/www/openstack-dashboard/openstack_dashboard/local/local_settings.d/_100_local_settings.py: >> >> >> LAUNCH_INSTANCE_DEFAULTS = { >> 'config_drive': False, >> 'create_volume': False, >> 'hide_create_volume': False, >> 'disable_image': False, >> 'disable_instance_snapshot': False, >> 'disable_volume': False, >> 'disable_volume_snapshot': False, >> 'enable_scheduler_hints': True, >> } >> >> Other configs from this file work as expected, so that custom file >> can't be the reason. >> After apache and memcached restart nothing changes, the default is >> still "true". Can anyone shed some light, please? I haven't tried >> other configs yet so I can't tell if more options are affected. >> >> Thanks! >> Eugen >> >> From thierry at openstack.org Wed Aug 26 09:00:52 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 26 Aug 2020 11:00:52 +0200 Subject: [largescale-sig] Next meeting: August 26, 8utc In-Reply-To: References: Message-ID: <362079ea-ef67-4e7b-a4f5-2f9ea17e7f95@openstack.org> During our meeting today we discussed Summit/PTG plans, and agreed to request one Forum session on scaling stories, and one PTG short meeting to replace our regular meeting that week. Meeting logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-08-26-08.00.html TODOs: - all to contact US large deployment friends to invite them to next EU-US meeting - ttx to request Forum/PTG sessions - belmoreira, ttx to push for OSops resurrection - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - masahito to push latest patches to oslo.metrics - ttx to look into a basic test framework for oslo,metrics - amorin to see if oslo.metrics could be tested at OVH Next meetings: Sep 9, 16:00UTC; Sep 23, 8:00UTC (#openstack-meeting-3) -- Thierry Carrez (ttx) From balazs.gibizer at est.tech Wed Aug 26 12:15:29 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Wed, 26 Aug 2020 14:15:29 +0200 Subject: DB Prune In-Reply-To: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> References: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> Message-ID: On Wed, Aug 26, 2020 at 06:57, "Szabo, Istvan (Agoda)" wrote: > Hi, > > We have a cluster where the user continuously spawn and delete > servers which makes the db even in compressed state 1.1GB. > I’m sure it has a huge amount of trash because this is a cicd > environment and the prod just uses 75MB. > How is it possible to cleanup the db on a safe way, what should be > the steps? > From Nova perspective you can get rid of the data of the already deleted instances via the following two commands: nova-manage db archive_deleted_rows nova-manage db purge Cheers, gibi [1]https://docs.openstack.org/nova/latest/cli/nova-manage.html > > Best regards, > Istvan > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by > copyright or other legal rules. If you have received it by mistake > please let us know by reply email and delete it from your system. It > is prohibited to copy this message or disclose its content to anyone. > Any confidentiality or privilege is not waived or lost by any > mistaken delivery or unauthorized disclosure of the message. All > messages sent to and from Agoda may be monitored to ensure compliance > with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, > amended, lost or deleted, or contain viruses. From sean.mcginnis at gmx.com Wed Aug 26 14:12:22 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 26 Aug 2020 09:12:22 -0500 Subject: [releases] Dropping my releases core/release-manager hat In-Reply-To: References: Message-ID: <4d71f05a-69db-76b8-6976-a4a2784f1124@gmx.com> On 8/21/20 9:35 AM, Jean-Philippe Evrard wrote: > Hello folks, > > I am sad to announce that, while super motivated to keep helping the team, I cannot reliably and consistantly do my duties of core in the releases team, due to my current duties at work. > > It's been a while I haven't significantly helped the release team, and the team deserve all the transparency and clarity it can get about its contributors. It's time for me to step down. > > It's been a pleasure to help the team while it lasted. If you are looking for a team to get involved in OpenStack, make no mistake, the release team is awesome. Thank you everyone in the team, you were all amazing and so welcoming :) > > Regards, > Jean-Philippe Evrard (evrardjp) > Thanks for all your help with everything you've done JP. Just let us know if the situation changes in the future. Sean From witold.bedyk at suse.com Wed Aug 26 15:38:27 2020 From: witold.bedyk at suse.com (Witek Bedyk) Date: Wed, 26 Aug 2020 17:38:27 +0200 Subject: [monasca] Retire monasca-analytics repository Message-ID: Hello, this message is to announce the retirement of openstack/monasca-analytics repository. The project will not accept any new patches. It will follow the process described in Project Team Guide [1]. Please reply to this message until Sept. 7 if you would like to take over the development and maintenance of this repository. Thanks Witek [1] https://docs.openstack.org/project-team-guide/repository.html#retiring-a-repository From mark at stackhpc.com Wed Aug 26 16:17:31 2020 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 26 Aug 2020 17:17:31 +0100 Subject: [kolla] Kayobe config walkthrough docs call Message-ID: Hi, In today's kolla IRC meeting we proposed to have a meeting to discuss the long awaited Kayobe configuration walkthrough documentation. We'll try to agree on an approach, and get people signed up for writing parts or all of it. The proposed meeting time is tomorrow (27th August) at 15:00 - 16:00 UTC, the same slot as the Kolla Klub (which is still on summer break). Please reply if you would like to attend but cannot make this slot. Google meet link: https://meet.google.com/xfg-ieza-qrz Regards, Mark From radoslaw.piliszek at gmail.com Wed Aug 26 17:38:03 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Wed, 26 Aug 2020 19:38:03 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: On Tue, Aug 25, 2020 at 10:03 AM Radosław Piliszek wrote: > I'll sit down to clean up the queue a bit and ask other new cores to > co-review and merge a few waiting patches. Aaand it's been done. :-) -yoctozepto From dev.faz at gmail.com Wed Aug 26 18:05:08 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 26 Aug 2020 20:05:08 +0200 Subject: [tc][masakari] Project aliveness (was: [masakari] Meetings) In-Reply-To: References: <6868fdd8-54cd-4ccf-a3d7-ffecf5eb601b@www.fastmail.com> Message-ID: <22517cce-3644-2918-b38a-c6fafac7aab4@googlemail.com> Hi, Am 26.08.20 um 19:38 schrieb Radosław Piliszek: > > Aaand it's been done. :-) I will check my emails tomorrow :) Have a nice evening, Fabian From cohuck at redhat.com Tue Aug 25 14:39:25 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Tue, 25 Aug 2020 16:39:25 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200820031621.GA24997@joy-OptiPlex-7040> References: <20200810074631.GA29059@joy-OptiPlex-7040> <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> Message-ID: <20200825163925.1c19b0f0.cohuck@redhat.com> On Thu, 20 Aug 2020 11:16:21 +0800 Yan Zhao wrote: > On Wed, Aug 19, 2020 at 09:22:34PM -0600, Alex Williamson wrote: > > On Thu, 20 Aug 2020 08:39:22 +0800 > > Yan Zhao wrote: > > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > |- [path to device] > > > > > > |--- migration > > > > > > | |--- self > > > > > > | | |---device_api > > > > > > | | |---mdev_type > > > > > > | | |---software_version > > > > > > | | |---device_id > > > > > > | | |---aggregator > > > > > > | |--- compatible > > > > > > | | |---device_api > > > > > > | | |---mdev_type > > > > > > | | |---software_version > > > > > > | | |---device_id > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > - Attribute is coupled with kobject > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > anyway, and you can just put things into an attribute group? > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > cover nearly everything? I would expect very few cases that are more > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > need that self/compatible split for that, either.] > > > Hi Cornelia, > > > > > > The reason I want to declare compatible list of attributes is that > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > as I demonstrated below, > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > fit. > > > > If you're suggesting that we need a new 'compatible' set for every > > aggregation, haven't we lost the purpose of aggregation? For example, > > rather than having N mdev types to represent all the possible > > aggregation values, we have a single mdev type with N compatible > > migration entries, one for each possible aggregation value. BTW, how do > > we have multiple compatible directories? compatible0001, > > compatible0002? Thanks, > > > do you think the bin_attribute I proposed yesterday good? > Then we can have a single compatible with a variable in the mdev_type and > aggregator. > > mdev_type=i915-GVTg_V5_{val1:int:2,4,8} > aggregator={val1}/2 I'm not really a fan of binary attributes other than in cases where we have some kind of binary format to begin with. IIUC, we basically have: - different partitioning (expressed in the mdev_type) - different number of partitions (expressed via the aggregator) - devices being compatible if the partitioning:aggregator ratio is the same (The multiple mdev_type variants seem to come from avoiding extra creation parameters, IIRC?) Would it be enough to export base_type=i915-GVTg_V5 aggregation_ratio= to express the various combinations that are compatible without the need for multiple sets of attributes? From anlin.kong at gmail.com Tue Aug 25 21:41:19 2020 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 26 Aug 2020 09:41:19 +1200 Subject: [openstack-community] Error add member to pool ( OCTAVIA ) when using SSL to verify In-Reply-To: <59EC5E93-FC3F-4EDC-A874-9A2F466B37DC@demarco.com> References: <692B1576-9AB1-46F9-9328-0D510DDCEE01@hxcore.ol> <59EC5E93-FC3F-4EDC-A874-9A2F466B37DC@demarco.com> Message-ID: >From the log, it seems like the HTTPS communication with Neutron failed, can you successfully talk to Neutron using HTTPS? You can also try to simulate the code here https://github.com/openstack/octavia/blob/stable%2Fussuri/octavia/network/drivers/neutron/base.py#L38 for testing. --- Lingxian Kong Senior Software Engineer Catalyst Cloud www.catalystcloud.nz On Wed, Aug 26, 2020 at 2:25 AM Amy Marrich wrote: > Adding the OpenStack discuss list. > > Amy (spotz) > > On Aug 24, 2020, at 11:14 PM, Vinh Nguyen Duc > wrote: > >  > > Dear Openstack community, > > > > My name is Duc Vinh, I am newer in Openstack > > I am deploy Openstack Ussuri on Centos8 , I am using three nodes > controller with High Availability topology and using HAproxy to verify > cert for connect HTTPS, > > I have trouble with project Octavia, I cannot add member in a pool after > created Loadbalancer, listener, pool ( everything is fine). > > Here is my log and configuration file: > > > > *LOGS: * > > > > 2020-08-25 10:55:42.872 226250 DEBUG octavia.network.drivers.neutron.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Neutron extension > security-group found enabled _check_extension_enabled > /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:42.892 226250 DEBUG octavia.network.drivers.neutron.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Neutron extension > dns-integration is not enabled _check_extension_enabled > /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:70 > > 2020-08-25 10:55:42.911 226250 DEBUG octavia.network.drivers.neutron.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Neutron extension qos > found enabled _check_extension_enabled > /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:42.933 226250 DEBUG octavia.network.drivers.neutron.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Neutron extension > allowed-address-pairs found enabled _check_extension_enabled > /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:43.068 226250 WARNING keystoneauth.identity.generic.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Failed to discover > available identity versions when contacting https://192.168.10.150:5000. > Attempting to parse version from URL.: > keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to > https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', > port=5000): Max retries exceeded with url: / (Caused by > SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify > failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Error retrieving subnet > (subnet id: 035f3183-f469-415f-b536-b4a81364e814.: > keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find > versioned identity endpoints when attempting to authenticate. Please check > that your auth_url is correct. SSL exception connecting to > https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', > port=5000): Max retries exceeded with url: / (Caused by > SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify > failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in > urlopen > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base chunked=chunked) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in > _make_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self._validate_conn(conn) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in > _validate_conn > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base conn.connect() > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 344, in > connect > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base ssl_context=context) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 367, in > ssl_wrap_socket > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return context.wrap_socket(sock) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", > line 365, in wrap_socket > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base _context=self, _session=session) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", > line 776, in __init__ > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", > line 1036, in do_handshake > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self._sslobj.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", > line 648, in do_handshake > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self._sslobj.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed > (_ssl.c:897) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base timeout=timeout > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in > urlopen > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > _stacktrace=sys.exc_info()[2]) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in > increment > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base raise MaxRetryError(_pool, url, > error or ResponseError(cause)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > urllib3.exceptions.MaxRetryError: > HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded > with url: / (Caused by SSLError(SSLError(1, '[SSL: > CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1004, in > _send_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base resp = > self.session.request(method, url, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/requests/sessions.py", line 533, in > request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base resp = self.send(prep, > **send_kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/requests/sessions.py", line 646, in send > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base r = adapter.send(request, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/requests/adapters.py", line 514, in send > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base raise SSLError(e, request=request) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > requests.exceptions.SSLError: HTTPSConnectionPool(host='192.168.10.150', > port=5000): Max retries exceeded with url: / (Caused by > SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify > failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", > line 138, in _do_create_plugin > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base authenticated=False) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line > 610, in get_discovery > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in > get_discovery > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base disc = Discover(session, url, > authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 536, in > __init__ > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 102, in > get_version_data > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base resp = session.get(url, > headers=headers, authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in > get > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return self.request(url, 'GET', > **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 913, in > request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base resp = send(**kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1008, in > _send_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base raise exceptions.SSLError(msg) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to > https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', > port=5000): Max retries exceeded with url: / (Caused by > SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify > failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > File > "/usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py", > line 193, in _get_resource > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base resource_type)(resource_id) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 869, > in show_subnet > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return self.get(self.subnet_path % > (subnet), params=_params) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 354, > in get > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base headers=headers, params=params) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 331, > in retry_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base headers=headers, params=params) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 282, > in do_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base headers=headers) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 339, in > do_request > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self._check_uri_length(url) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 332, in > _check_uri_length > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base uri_len = len(self.endpoint_url) + > len(url) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 346, in > endpoint_url > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return self.get_endpoint() > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 282, in > get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return > self.session.get_endpoint(auth or self.auth, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1225, in > get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base return auth.get_endpoint(self, > **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line > 380, in get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base > allow_version_hack=allow_version_hack, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line > 271, in get_endpoint_data > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base service_catalog = > self.get_access(session).service_catalog > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line > 134, in get_access > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self.auth_ref = > self.get_auth_ref(session) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", > line 206, in get_auth_ref > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base self._plugin = > self._do_create_plugin(session) > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base File > "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", > line 161, in _do_create_plugin > > 2020-08-25 10:55:43.070 226250 ERROR > octavia.network.drivers.neutron.base 'auth_url is correct. %s' % e) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find > versioned identity endpoints when attempting to authenticate. Please check > that your auth_url is correct. SSL exception connecting to > https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', > port=5000): Max retries exceeded with url: / (Caused by > SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify > failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.074 226250 DEBUG wsme.api > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Client-side error: > Subnet 035f3183-f469-415f-b536-b4a81364e814 not found. format_exception > /usr/lib/python3.6/site-packages/wsme/api.py:222 > > 2020-08-25 10:55:43.076 226250 DEBUG octavia.common.keystone > [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - > 8259463ce052437396afa845933afe4b - default default] Request path is / and > it does not require keystone authentication process_request > /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > > 2020-08-25 10:55:43.080 226250 DEBUG octavia.common.keystone > [req-5091d326-0cb4-4ae1-bf4b-9ef6b9313dca - - - - -] Request path is / and > it does not require keystone authentication process_request > /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > > > > *Configuration:* > > [root at controller01 ~]# cat /etc/octavia/octavia.conf > > [DEFAULT] > > > > log_dir = /var/log/octavia > > debug = True > > transport_url = rabbit:// > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672, > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672, > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > > > [api_settings] > > api_base_uri = https://192.168.10.150:9876 > > bind_host = 192.168.10.178 > > bind_port = 9876 > > auth_strategy = keystone > > healthcheck_enabled = True > > allow_tls_terminated_listeners = True > > > > [database] > > connection = mysql+pymysql:// > octavia:FUkbii8AY4G6H9LxbJ2RRlOzHN61X8PI8FrMcuXQ at 192.168.10.150/octavia > > max_retries = -1 > > > > [health_manager] > > bind_port = 5555 > > bind_ip = 192.168.10.178 > > controller_ip_port_list = 192.168.10.178:5555, 192.168.10.179:5555, > 192.168.10.28:5555 > > heartbeat_key = insecure > > > > [keystone_authtoken] > > service_token_roles_required = True > > www_authenticate_uri = https://192.168.10.150:5000 > > auth_url = https://192.168.10.150:5000 > > region_name = Hanoi > > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211, > 192.168.10.28:11211 > > auth_type = password > > project_domain_name = Default > > user_domain_name = Default > > project_name = service > > username = octavia > > password = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > > cafile = /etc/ssl/private/haproxy.pem > > insecure = false > > > > > > [certificates] > > cert_generator = local_cert_generator > > #server_certs_key_passphrase = insecure-key-do-not-use-this-key > > ca_private_key_passphrase = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > > ca_private_key = /etc/octavia/certs/server_ca.key.pem > > ca_certificate = /etc/octavia/certs/server_ca.cert.pem > > region_name = Hanoi > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > endpoint_type = internal > > > > [networking] > > #allow_vip_network_id = True > > #allow_vip_subnet_id = True > > #allow_vip_port_id = True > > > > [haproxy_amphora] > > #bind_port = 9443 > > server_ca = /etc/octavia/certs/server_ca.cert.pem > > client_cert = /etc/octavia/certs/client.cert-and-key.pem > > base_path = /var/lib/octavia > > base_cert_dir = /var/lib/octavia/certs > > connection_max_retries = 1500 > > connection_retry_interval = 1 > > > > [controller_worker] > > amp_image_tag = amphora > > amp_ssh_key_name = octavia > > amp_secgroup_list = 80f44b73-dc9f-48aa-a0b8-8b78e5c6585c > > amp_boot_network_list = 04425cb2-5963-48f5-a229-b89b7c6036bd > > amp_flavor_id = 200 > > network_driver = allowed_address_pairs_driver > > compute_driver = compute_nova_driver > > amphora_driver = amphora_haproxy_rest_driver > > client_ca = /etc/octavia/certs/client_ca.cert.pem > > loadbalancer_topology = SINGLE > > amp_active_retries = 9999 > > > > [task_flow] > > [oslo_messaging] > > topic = octavia_prov > > rpc_thread_pool_size = 2 > > > > [house_keeping] > > [amphora_agent] > > [keepalived_vrrp] > > > > [service_auth] > > auth_url = https://192.168.10.150:5000 > > auth_type = password > > project_domain_name = default > > user_domain_name = default > > project_name = admin > > username = admin > > password = F35sXAYW5qDlMGfQbhmexIx12DqrQdpw6ixAseTd > > cafile = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211, > 192.168.10.28:11211 > > #insecure = true > > > > > > [glance] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [neutron] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [cinder] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [nova] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [oslo_policy] > > #policy_file = /etc/octavia/policy.json > > > > [oslo_messaging_notifications] > > transport_url = rabbit:// > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672, > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672, > openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Wed Aug 26 06:41:17 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 26 Aug 2020 14:41:17 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200825163925.1c19b0f0.cohuck@redhat.com> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> <20200825163925.1c19b0f0.cohuck@redhat.com> Message-ID: <20200826064117.GA22243@joy-OptiPlex-7040> On Tue, Aug 25, 2020 at 04:39:25PM +0200, Cornelia Huck wrote: <...> > > do you think the bin_attribute I proposed yesterday good? > > Then we can have a single compatible with a variable in the mdev_type and > > aggregator. > > > > mdev_type=i915-GVTg_V5_{val1:int:2,4,8} > > aggregator={val1}/2 > > I'm not really a fan of binary attributes other than in cases where we > have some kind of binary format to begin with. > > IIUC, we basically have: > - different partitioning (expressed in the mdev_type) > - different number of partitions (expressed via the aggregator) > - devices being compatible if the partitioning:aggregator ratio is the > same > > (The multiple mdev_type variants seem to come from avoiding extra > creation parameters, IIRC?) > > Would it be enough to export > base_type=i915-GVTg_V5 > aggregation_ratio= > > to express the various combinations that are compatible without the > need for multiple sets of attributes? yes. I agree we need to decouple the mdev type name and aggregator for compatibility detection purpose. please allow me to put some words to describe the history and motivation of introducing aggregator. initially, we have fixed mdev_type i915-GVTg_V5_1, i915-GVTg_V5_2, i915-GVTg_V5_4, i915-GVTg_V5_8, the digital after i915-GVTg_V5 representing the max number of instances allowed to be created for this type. They also identify how many resources are to be allocated for each type. They are so far so good for current intel vgpus, i.e., cutting the physical GPU into several virtual pieces and sharing them among several VMs in pure mediation way. fixed types are provided in advance as we thought it can meet needs from most users and users can know the hardware capability they acquired from the type name. the bigger in number, the smaller piece of physical hardware. Then, when it comes to scalable IOV in near future, one physical hardware is able to be cut into a large number of units in hardware layer The single unit to be assigned into guest can be very small while one to several units are grouped into an mdev. The fixed type scheme is then cumbersome. Therefore, a new attribute aggregator is introduced to specify the number of resources to be assigned based on the base resource specified in type name. e.g. if type name is dsa-1dwq, and aggregator is 30, then the assignable resources to guest is 30 wqs in a single created mdev. if type name is dsa-2dwq, and aggregator is 15, then the assignable resources to guest is also 30wqs in a single created mdev. (in this example, the rule to define type name is different to the case in GVT. here 1 wq means wq number is 1. yes, they are current reality. :) ) previously, we want to regard the two mdevs created with dsa-1dwq x 30 and dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources. But, as it's a burden to upper layer, we agree that if this condition happens, we still treat the two as incompatible. To fix it, either the driver should expose dsa-1dwq only, or the target dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30. Does it make sense? Thanks Yan From yan.y.zhao at intel.com Wed Aug 26 08:54:11 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 26 Aug 2020 16:54:11 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <47d216330e10152f0f5d27421da60a7b1c52e5f0.camel@redhat.com> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <242591bb809b68c618f62fdc93d4f8ae7b146b6d.camel@redhat.com> <20200820040116.GB24121@joy-OptiPlex-7040> <20200820062725.GB24997@joy-OptiPlex-7040> <47d216330e10152f0f5d27421da60a7b1c52e5f0.camel@redhat.com> Message-ID: <20200826085411.GB22243@joy-OptiPlex-7040> On Thu, Aug 20, 2020 at 02:24:26PM +0100, Sean Mooney wrote: > On Thu, 2020-08-20 at 14:27 +0800, Yan Zhao wrote: > > On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote: > > > On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote: > > > > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote: > > > > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote: > > > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote: > > > > > > > On Tue, 18 Aug 2020 10:16:28 +0100 > > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote: > > > > > > > > > On 2020/8/18 下午4:55, Daniel P. Berrangé wrote: > > > > > > > > > > > > > > > > > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > > > On 2020/8/14 下午1:16, Yan Zhao wrote: > > > > > > > > > > > > > > > > > > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote: > > > > > > > > > > > > > > > > > > On 2020/8/10 下午3:46, Yan Zhao wrote: > > > > > > > > > we actually can also retrieve the same information through sysfs, .e.g > > > > > > > > > > > > > > > > > > |- [path to device] > > > > > > > > > |--- migration > > > > > > > > > | |--- self > > > > > > > > > | | |---device_api > > > > > > > > > | | |---mdev_type > > > > > > > > > | | |---software_version > > > > > > > > > | | |---device_id > > > > > > > > > | | |---aggregator > > > > > > > > > | |--- compatible > > > > > > > > > | | |---device_api > > > > > > > > > | | |---mdev_type > > > > > > > > > | | |---software_version > > > > > > > > > | | |---device_id > > > > > > > > > | | |---aggregator > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes but: > > > > > > > > > > > > > > > > > > - You need one file per attribute (one syscall for one attribute) > > > > > > > > > - Attribute is coupled with kobject > > > > > > > > > > > > > > Is that really that bad? You have the device with an embedded kobject > > > > > > > anyway, and you can just put things into an attribute group? > > > > > > > > > > > > > > [Also, I think that self/compatible split in the example makes things > > > > > > > needlessly complex. Shouldn't semantic versioning and matching already > > > > > > > cover nearly everything? I would expect very few cases that are more > > > > > > > complex than that. Maybe the aggregation stuff, but I don't think we > > > > > > > need that self/compatible split for that, either.] > > > > > > > > > > > > Hi Cornelia, > > > > > > > > > > > > The reason I want to declare compatible list of attributes is that > > > > > > sometimes it's not a simple 1:1 matching of source attributes and target attributes > > > > > > as I demonstrated below, > > > > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > > > > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > > > > > > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > > > > > > > > > the way you are doing the nameing is till really confusing by the way > > > > > if this has not already been merged in the kernel can you chagne the mdev > > > > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead of half the device > > > > > > > > > > currently you need to deived the aggratod by the number at the end of the mdev type to figure out > > > > > how much of the phsicial device is being used with is a very unfridly api convention > > > > > > > > > > the way aggrator are being proposed in general is not really someting i like but i thin this at least > > > > > is something that should be able to correct. > > > > > > > > > > with the complexity in the mdev type name + aggrator i suspect that this will never be support > > > > > in openstack nova directly requireing integration via cyborg unless we can pre partion the > > > > > device in to mdevs staicaly and just ignore this. > > > > > > > > > > this is way to vendor sepecif to integrate into something like openstack in nova unless we can guarentee > > > > > taht how aggreator work will be portable across vendors genericly. > > > > > > > > > > > > > > > > > and aggragator may be just one of such examples that 1:1 matching does not > > > > > > fit. > > > > > > > > > > for openstack nova i dont see us support anything beyond the 1:1 case where the mdev type does not change. > > > > > > > > > > > > > hi Sean, > > > > I understand it's hard for openstack. but 1:N is always meaningful. > > > > e.g. > > > > if source device 1 has cap A, it is compatible to > > > > device 2: cap A, > > > > device 3: cap A+B, > > > > device 4: cap A+B+C > > > > .... > > > > to allow openstack to detect it correctly, in compatible list of > > > > device 2, we would say compatible cap is A; > > > > device 3, compatible cap is A or A+B; > > > > device 4, compatible cap is A or A+B, or A+B+C; > > > > > > > > then if openstack finds device A's self cap A is contained in compatible > > > > cap of device 2/3/4, it can migrate device 1 to device 2,3,4. > > > > > > > > conversely, device 1's compatible cap is only A, > > > > so it is able to migrate device 2 to device 1, and it is not able to > > > > migrate device 3/4 to device 1. > > > > > > yes we build the palcement servce aroudn the idea of capablites as traits on resocue providres. > > > which is why i originally asked if we coudl model compatibality with feature flags > > > > > > we can seaislyt model deivce as aupport A, A+B or A+B+C > > > and then select hosts and evice based on that but > > > > > > the list of compatable deivce you are propsoeing hide this feature infomation which whould be what we are matching > > > on. > > > > > > give me a lset of feature you want and list ting the feature avaiable on each device allow highre level ocestation > > > to > > > easily match the request to a host that can fulllfile it btu thave a set of other compatihble device does not help > > > with > > > that > > > > > > so if a simple list a capabliteis can be advertiese d and if we know tha two dievce with the same capablity are > > > intercahangebale that is workabout i suspect that will not be the case however and it would onely work within a > > > familay > > > of mdevs that are closely related. which i think agian is an argument for not changeing the mdev type and at least > > > intially only look at migatreion where the mdev type doee not change initally. > > > > > > > sorry Sean, I don't understand your words completely. > > Please allow me to write it down in my words, and please confirm if my > > understanding is right. > > 1. you mean you agree on that each field is regarded as a trait, and > > openstack can compare by itself if source trait is a subset of target trait, right? > > e.g. > > source device > > field1=A1 > > field2=A2+B2 > > field3=A3 > > > > target device > > field1=A1+B1 > > field2=A2+B2 > > filed3=A3 > > > > then openstack sees that field1/2/3 in source is a subset of field1/2/3 in > > target, so it's migratable to target? > > yes this is basically how cpu feature work. > if we see the host cpu on the dest is a supperset of the cpu feature used > by the vm we know its safe to migrate. got it. glad to know it :) > > > > 2. mdev_type + aggregator make it hard to achieve the above elegant > > solution, so it's best to avoid the combined comparing of mdev_type + aggregator. > > do I understand it correctly? > yes and no. one of the challange that mdevs pose right now is that sometiem mdev model > independent resouces and sometimes multipe mdev types consume the same underlying resouces > there is know way for openstack to know if i915-GVTg_V5_2 and i915-GVTg_V5_4 consume the same resouces > or not. as such we cant do the accounting properly so i would much prefer to have just 1 mdev type > i915-GVTg and which models the minimal allocatable unit and then say i want 4 of them comsed into 1 device > then have a second mdev type that does that since > > what that means in pratice is we cannot trust the available_instances for a given mdev type > as consuming a different mdev type might change it. aggrators makes that problem worse. > which is why i siad i would prefer if instead of aggreator as prposed each consumable > resouce was reported indepenedly as different mdev types and then we composed those > like we would when bond ports creating an attachment or other logical aggration that refers > to instance of mdevs of differing type which we expose as a singel mdev that is exposed to the guest. > in a concreate example we might say create a aggreator of 64 cuda cores and 32 tensor cores and "bond them" > or aggrate them as a single attachme mdev and provide that to a ml workload guest. a differnt guest could request > 1 instace of the nvenc video encoder and one instance of the nvenc video decoder but no cuda or tensor for a video > transcoding workload. > The "bond" you described is a little different from the intension of the aggregator we introduced for scalable IOV. (as explained in another mail to Cornelia https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg06523.html). But any way, we agree that mdevs are not compatible if mdev_types are not compatible. > if each of those componets are indepent mdev types and can be composed with that granularity then i think that approch > is better then the current aggreator with vendor sepcific fileds. > we can model the phsical device as being multipel nested resouces with different traits for each type of resouce and > different capsities for the same. we can even model how many of the attachments/compositions can be done indepently > if there is a limit on that. > > |- [parent physical device] > |--- Vendor-specific-attributes [optional] > |--- [mdev_supported_types] > | |--- [] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > | | |--- [devices] > | |--- [] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > | | |--- [devices] > | |--- [] > | |--- create > | |--- name > | |--- available_instances > | |--- device_api > | |--- description > | |--- [devices] > > a benifit of this appoch is we would be the mdev types would not change on migration > and we could jsut compuare a a simeple version stirgh and feature flag list to determin comaptiablity > in a vendor neutral way. i dont nessisarly need to know what the vendeor flags mean just that the dest is a subset of > the source and that the semaitic version numbers say the mdevs are compatible. > > as aggregator and some other attributes are only meaningful after devices are created, and vendors' naming of mdev types are not unified, do you think below way is good? |- [parent physical device] |--- [mdev_supported_types] | |--- [] | | |--- create | | |--- name | | |--- available_instances | | |--- compatible_type [must] | | |--- Vendor-specific-compatible-type-attributes [optional] | | |--- device_api [must] | | |--- software_version [must] | | |--- description | | |--- [devices] | | |--------[] | | | |--- vendor-specific-compatible-device-attriutes [optional] all vendor specific compatible attributes begin with compatible in name. in GVT's current case, |- 0000\:00\:02.0 |--- mdev_supported_types | |--- i915-GVTg_V5_8 | | |--- create | | |--- name | | |--- available_instances | | |--- compatible_type : i915-GVTg_V5_8, i915-GVTg_V4_8 | | |--- device_api : vfio-pci | | |--- software_version : 1.0.0 | | |--- compatible_pci_ids : 5931, 591b | | |--- description | | |--- devices | | | |- 882cc4da-dede-11e7-9180-078a62063ab1 | | | | | --- aggregator : 1 | | | | | --- compatible_aggregator : 1 suppose 882cc4da-dede-11e7-9180-078a62063ab1 is a src mdev. the sequence for openstack to find a compatible mdev in my mind is that 1. make src mdev type and compatible_type as traits. 2. look for a mdev type that is either i915-GVTg_V4_8 or i915-GVTg_V5_8 as that in compatible_type. (this is just an example, currently we only support migration between mdevs whose attributes are all matching, from mdev type to aggregator, to pci_ids) 3. if 2 fails, try to find a mdev type whose compatible_type is a superset of src compatible_type. if found one, go to step 4; otherwise, quit. 4. check if device_api, software_version under the type are compatible. 5. check if other vendor specific type attributes under the type are compatible. - check if src compatible_pci_ids is a subset of target compatible_pci_ids. 6. check if device is created and not occupied, if not, create one. 7. check if vendor specific attributes under the device are compatible. - check if src compatible_aggregator is a subset of target compatible_aggregator. if fails, try to find counterpart attribute of vendor specific device attribute and set target value according to compatible_xxx in source side. (for compatible_aggregator, its counterpart is aggregator.) if attribute aggregator exists, step 7 succeeds when setting of its value succeeds. if attribute aggregator does not exist, step 7 fails. 8. a compatible target is found. not sure if the above steps look good to you. some changes are required for compatibility check for physical device when mdev_type is absent. but let's first arrive at consensus for mdevs first :) > > 3. you don't like self list and compatible list, because it is hard for > > openstack to compare different traits? > > e.g. if we have self list and compatible list, then as below, openstack needs > > to compare if self field1/2/3 is a subset of compatible field 1/2/3. > currnetly we only use mdevs for vGPUs and in our documentaiton we tell customer > to model the mdev_type as a trait and request it as a reuiqred trait. > so for customer that are doing that today changing mdev types is not really an option. > we would prefer that they request the feature they need instead of a spefic mdev type > so we can select any that meets there needs > for example we have a bunch of traits for cuda support > https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda.py > or driectx/vulkan/opengl https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/api.py > these are closely analogous to cpu feature flag lix avx or sse > https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/__init__.py#L16 > > so when it comes to compatiablities it would be ideal if you could express capablities as something like > a cpu feature flag then we can eaisly model those as traits. > > > > source device: > > self field1=A1 > > self field2=A2+B2 > > self field3=A3 > > > > compatible field1=A1 > > compatible field2=A2;B2;A2+B2; > > compatible field3=A3 > > > > > > target device: > > self field1=A1+B1 > > self field2=A2+B2 > > self field3=A3 > > > > compatible field1=A1;B1;A1+B1; > > compatible field2=A2;B2;A2+B2; > > compatible field3=A3 > > > > > > Thanks > > Yan > > > > > > > > > > > > > > > > > i woudl really prefer if there was just one mdev type that repsented the minimal allcatable unit and the > > > > > aggragaotr where used to create compostions of that. i.e instad of i915-GVTg_V5_2 beign half the device, > > > > > have 1 mdev type i915-GVTg and if the device support 8 of them then we can aggrate 4 of i915-GVTg > > > > > > > > > > if you want to have muplie mdev type to model the different amoutn of the resouce e.g. i915-GVTg_small i915- > > > > > GVTg_large > > > > > that is totlaly fine too or even i915-GVTg_4 indcating it sis 4 of i915-GVTg > > > > > > > > > > failing that i would just expose an mdev type per composable resouce and allow us to compose them a the user > > > > > level > > > > > with > > > > > some other construct mudeling a attament to the device. e.g. create composed mdev or somethig that is an > > > > > aggreateion > > > > > of > > > > > multiple sub resouces each of which is an mdev. so kind of like how bond port work. we would create an mdev for > > > > > each > > > > > of > > > > > the sub resouces and then create a bond or aggrated mdev by reference the other mdevs by uuid then attach only > > > > > the > > > > > aggreated mdev to the instance. > > > > > > > > > > the current aggrator syntax and sematic however make me rather uncofrotable when i think about orchestating vms > > > > > on > > > > > top > > > > > of it even to boot them let alone migrate them. > > > > > > > > > > > > So, we explicitly list out self/compatible attributes, and management > > > > > > tools only need to check if self attributes is contained compatible > > > > > > attributes. > > > > > > > > > > > > or do you mean only compatible list is enough, and the management tools > > > > > > need to find out self list by themselves? > > > > > > But I think provide a self list is easier for management tools. > > > > > > > > > > > > Thanks > > > > > > Yan > > > > > > > > > > > > > > > > > > > From ankelezhang at gmail.com Wed Aug 26 09:30:24 2020 From: ankelezhang at gmail.com (Ankele zhang) Date: Wed, 26 Aug 2020 17:30:24 +0800 Subject: nova config vCenter and creating instance failed Message-ID: Hi all I have config vCenter in my nova.conf, cinder.conf and glance-api.conf. First of all, I can create VM inner vSphere successfully and I can create VM inner OpenStack without vCenter configuration successfully. Now I config vCenter driver in nova, cinder and glance. Creating images and volumes successfully, but when I create VM instance, I got the error message "Build of instance e3e8e049-98fc-486e-95c7-e17ec0e22e59 aborted: 主机配置过程中出错。" , in english is "Build of instance e3e8e049-98fc-486e-95c7-e17ec0e22e59 aborted: an error occurred during host configuration". And error in vCSA client is just "主机配置过程中出错" while creating vm. Environment: OpenStack(Rocky), vSphere(6.7), storage(iSCSI),network(OVS vlan),vCenter is VMware-VIM-all-6.7.0-16046470.iso installed in windows2012 server. I don't know where did my configuration error in OpenStack or something error in my vSphere. nova.conf: [default] ... compute_driver = vmwareapi.VMwareVCDriver [vmware] host_ip = 192.168.3.115 host_username = administrator at vsphere.local host_password = Zl at 123456 cluster_name = mycluster datastore_regex = Datastore_iscsi insecure = True vlan_interface = vmnic0 integration_bridge = br-int api_retry_count = 10 cinder.conf: [DEFAULT] enabled_backends = vmware default_volume_type = vmware [vmware] volume_driver = cinder.volume.drivers.vmware.vmdk.VMwareVcVmdkDriver vmware_host_ip=192.168.3.115 vmware_host_password=Zl at 123456 vmware_host_username=administrator at vsphere.local vmware_wsdl_location=https://192.168.3.115/sdk/vimService.wsdl vmware_volume_folder= openstack_volume vmware_datastore_regex = Datastore_iscsi vmware_insecure = True vmware_host_version = 6.7 glance-api.conf: [default] ... known_stores = vmware default_store = vmware [glance_store] filesystem_store_datadir = /tri_fs/images/ stores = files,http,vmware default_store = vsphere vmware_server_host = 192.168.3.115 vmware_server_username = administrator at vsphere.local vmware_server_password = Zl at 123456 vmware_datastore_name = Datastore_iscsi vmware_datacenter_path = Datacenter vmware_datastores = Datacenter:Datastore_iscsi vmware_task_poll_interval = 5 vmware_store_image_dir = /openstack_glance vmware_api_insecure = True I hope you can help me. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Wed Aug 26 19:44:04 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 26 Aug 2020 15:44:04 -0400 Subject: [nova][barbican][qa] barbican-tempest-plugin change breaking bfv [ceph] Message-ID: Hi everyone, We just had our gating break due to a change merging inside barbican-tempest-plugin which is the following: https://review.opendev.org/#/c/515210/ It is resulting in an exception in our CI: 2020-08-26 18:04:32.663188 | controller | Response - Headers: {'content-length': '257', 'content-type': 'application/json', 'x-openstack-request-id': 'req-7f55e463-c3de-445e-a814-ef79c5f21235', 'connection': 'close', 'status': '409', 'content-location': 'http://glance.openstack.svc.cluster.local/v2/images/dec14e17-0870-415f-82a6-140c1b7e4a39'} 2020-08-26 18:04:32.663200 | controller | Body: b'{"message": "Image dec14e17-0870-415f-82a6-140c1b7e4a39 could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.

\\n\\n\\n", "code": "409 Conflict", "title": "Conflict"}' This is usually because it's trying to delete the Glance image before deleting an instance that is using it, the specific test that is failing is: barbican_tempest_plugin.tests.scenario.test_certificate_validation.CertificateValidationTest.test_signed_image_invalid_cert_boot_failure[compute,id-6d354881-35a6-4568-94b8-2204bbf67b29,image] This test landed yesterday, we're blacklisting the scenario right now. I do find it quite interesting that in the logs here: http://paste.openstack.org/show/797186/ That the instance reports that it _does_ indeed delete it, so maybe we are trying to delete the image afterwards _too quickly_ and need to wait for Nova to clean up? I'd love to enable that test again and continue full coverage, happy to hear discussion. Thanks Mohammed -- Mohammed Naser VEXXHOST, Inc. From amy at demarco.com Wed Aug 26 21:08:07 2020 From: amy at demarco.com (Amy Marrich) Date: Wed, 26 Aug 2020 16:08:07 -0500 Subject: [Diversity] Diversity & Inclusion WG Meeting 8/31 - Removing Divisive Language Message-ID: The Diversity & Inclusion WG has taken on the task from this week's Board meeting to assist with the development of the OSF's stance on the removal of Divisive Language within the OSF projects. The WG invites members of all OSF projects to participate in this effort and to join us at our next meeting Monday, August 31, at 17:00 UTC which will be held at https://meetpad.opendev.org/osf-diversity-and-inclusion. The agenda can be found at https://etherpad.openstack.org/p/diversity-wg-agenda. If you have any questions please let me and the team know here, on #openstack-diversity on IRC, or you can email me directly. Thanks, Amy Marrich (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Thu Aug 27 00:58:17 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 26 Aug 2020 17:58:17 -0700 Subject: [openstack-community] Error add member to pool ( OCTAVIA ) when using SSL to verify In-Reply-To: <59EC5E93-FC3F-4EDC-A874-9A2F466B37DC@demarco.com> References: <692B1576-9AB1-46F9-9328-0D510DDCEE01@hxcore.ol> <59EC5E93-FC3F-4EDC-A874-9A2F466B37DC@demarco.com> Message-ID: Thank you again Amy. Hi Duc Vinh, Sorry to hear you are having trouble getting Octavia setup. It appears to be an issue with the certificate on the keystone endpoint. >From the log and your configuration I can see: Your keystone auth_url is https://192.168.10.150:5000 You CAfile for this endpoint is configured as: /etc/ssl/private/haproxy.pem Let's test that configuration by running the following command: echo "Q" | openssl s_client -connect 192.168.10.150:5000 -CAfile /etc/ssl/private/haproxy.pem This will return a lot of information about the certificate on the endpoint and test the CA file. In the output of this command, you want to see "Verification: OK". If you don't, there is a problem either with the certificate on the endpoint of the CA file being used. Check both match and are the expected files. If you are still not sure what is wrong, please send the output of the above command and the output of the following command: openssl x509 -in /etc/ssl/private/haproxy.pem -noout -text I will take a look at that information and should be able to help. Michael On Tue, Aug 25, 2020 at 7:19 AM Amy Marrich wrote: > > Adding the OpenStack discuss list. > > Amy (spotz) > > On Aug 24, 2020, at 11:14 PM, Vinh Nguyen Duc wrote: > >  > > Dear Openstack community, > > > > My name is Duc Vinh, I am newer in Openstack > > I am deploy Openstack Ussuri on Centos8 , I am using three nodes controller with High Availability topology and using HAproxy to verify cert for connect HTTPS, > > I have trouble with project Octavia, I cannot add member in a pool after created Loadbalancer, listener, pool ( everything is fine). > > Here is my log and configuration file: > > > > LOGS: > > > > 2020-08-25 10:55:42.872 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension security-group found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:42.892 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension dns-integration is not enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:70 > > 2020-08-25 10:55:42.911 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension qos found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:42.933 226250 DEBUG octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Neutron extension allowed-address-pairs found enabled _check_extension_enabled /usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > > 2020-08-25 10:55:43.068 226250 WARNING keystoneauth.identity.generic.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Failed to discover available identity versions when contacting https://192.168.10.150:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Error retrieving subnet (subnet id: 035f3183-f469-415f-b536-b4a81364e814.: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base chunked=chunked) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._validate_conn(conn) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base conn.connect() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 344, in connect > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base ssl_context=context) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 367, in ssl_wrap_socket > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return context.wrap_socket(sock) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 365, in wrap_socket > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base _context=self, _session=session) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 776, in __init__ > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._sslobj.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._sslobj.do_handshake() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base timeout=timeout > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base _stacktrace=sys.exc_info()[2]) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise MaxRetryError(_pool, url, error or ResponseError(cause)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1004, in _send_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = self.session.request(method, url, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 533, in request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = self.send(prep, **send_kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 646, in send > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base r = adapter.send(request, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 514, in send > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise SSLError(e, request=request) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base requests.exceptions.SSLError: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=False) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base disc = Discover(session, url, authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 536, in __init__ > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/discover.py", line 102, in get_version_data > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = session.get(url, headers=headers, authenticated=authenticated) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.request(url, 'GET', **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 913, in request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resp = send(**kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1008, in _send_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base raise exceptions.SSLError(msg) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base keystoneauth1.exceptions.connection.SSLError: SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base During handling of the above exception, another exception occurred: > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base Traceback (most recent call last): > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py", line 193, in _get_resource > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base resource_type)(resource_id) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 869, in show_subnet > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.get(self.subnet_path % (subnet), params=_params) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 354, in get > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers, params=params) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 331, in retry_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers, params=params) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 282, in do_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base headers=headers) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 339, in do_request > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._check_uri_length(url) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 332, in _check_uri_length > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base uri_len = len(self.endpoint_url) + len(url) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/neutronclient/client.py", line 346, in endpoint_url > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.get_endpoint() > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 282, in get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return self.session.get_endpoint(auth or self.auth, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 1225, in get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base return auth.get_endpoint(self, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 380, in get_endpoint > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base allow_version_hack=allow_version_hack, **kwargs) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 271, in get_endpoint_data > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base service_catalog = self.get_access(session).service_catalog > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self.auth_ref = self.get_auth_ref(session) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base self._plugin = self._do_create_plugin(session) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base File "/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 161, in _do_create_plugin > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base 'auth_url is correct. %s' % e) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. SSL exception connecting to https://192.168.10.150:5000: HTTPSConnectionPool(host='192.168.10.150', port=5000): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)) > > 2020-08-25 10:55:43.070 226250 ERROR octavia.network.drivers.neutron.base > > 2020-08-25 10:55:43.074 226250 DEBUG wsme.api [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Client-side error: Subnet 035f3183-f469-415f-b536-b4a81364e814 not found. format_exception /usr/lib/python3.6/site-packages/wsme/api.py:222 > > 2020-08-25 10:55:43.076 226250 DEBUG octavia.common.keystone [req-57c5b37c-e50f-4d50-b535-b0a3d19db1d5 - 8259463ce052437396afa845933afe4b - default default] Request path is / and it does not require keystone authentication process_request /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > > 2020-08-25 10:55:43.080 226250 DEBUG octavia.common.keystone [req-5091d326-0cb4-4ae1-bf4b-9ef6b9313dca - - - - -] Request path is / and it does not require keystone authentication process_request /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77 > > > > Configuration: > > [root at controller01 ~]# cat /etc/octavia/octavia.conf > > [DEFAULT] > > > > log_dir = /var/log/octavia > > debug = True > > transport_url = rabbit://openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > > > [api_settings] > > api_base_uri = https://192.168.10.150:9876 > > bind_host = 192.168.10.178 > > bind_port = 9876 > > auth_strategy = keystone > > healthcheck_enabled = True > > allow_tls_terminated_listeners = True > > > > [database] > > connection = mysql+pymysql://octavia:FUkbii8AY4G6H9LxbJ2RRlOzHN61X8PI8FrMcuXQ at 192.168.10.150/octavia > > max_retries = -1 > > > > [health_manager] > > bind_port = 5555 > > bind_ip = 192.168.10.178 > > controller_ip_port_list = 192.168.10.178:5555, 192.168.10.179:5555, 192.168.10.28:5555 > > heartbeat_key = insecure > > > > [keystone_authtoken] > > service_token_roles_required = True > > www_authenticate_uri = https://192.168.10.150:5000 > > auth_url = https://192.168.10.150:5000 > > region_name = Hanoi > > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211,192.168.10.28:11211 > > auth_type = password > > project_domain_name = Default > > user_domain_name = Default > > project_name = service > > username = octavia > > password = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > > cafile = /etc/ssl/private/haproxy.pem > > insecure = false > > > > > > [certificates] > > cert_generator = local_cert_generator > > #server_certs_key_passphrase = insecure-key-do-not-use-this-key > > ca_private_key_passphrase = esGn3rN3iJOAD2HXmqznFPI9oAY2wQNDWYwqJaCH > > ca_private_key = /etc/octavia/certs/server_ca.key.pem > > ca_certificate = /etc/octavia/certs/server_ca.cert.pem > > region_name = Hanoi > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > endpoint_type = internal > > > > [networking] > > #allow_vip_network_id = True > > #allow_vip_subnet_id = True > > #allow_vip_port_id = True > > > > [haproxy_amphora] > > #bind_port = 9443 > > server_ca = /etc/octavia/certs/server_ca.cert.pem > > client_cert = /etc/octavia/certs/client.cert-and-key.pem > > base_path = /var/lib/octavia > > base_cert_dir = /var/lib/octavia/certs > > connection_max_retries = 1500 > > connection_retry_interval = 1 > > > > [controller_worker] > > amp_image_tag = amphora > > amp_ssh_key_name = octavia > > amp_secgroup_list = 80f44b73-dc9f-48aa-a0b8-8b78e5c6585c > > amp_boot_network_list = 04425cb2-5963-48f5-a229-b89b7c6036bd > > amp_flavor_id = 200 > > network_driver = allowed_address_pairs_driver > > compute_driver = compute_nova_driver > > amphora_driver = amphora_haproxy_rest_driver > > client_ca = /etc/octavia/certs/client_ca.cert.pem > > loadbalancer_topology = SINGLE > > amp_active_retries = 9999 > > > > [task_flow] > > [oslo_messaging] > > topic = octavia_prov > > rpc_thread_pool_size = 2 > > > > [house_keeping] > > [amphora_agent] > > [keepalived_vrrp] > > > > [service_auth] > > auth_url = https://192.168.10.150:5000 > > auth_type = password > > project_domain_name = default > > user_domain_name = default > > project_name = admin > > username = admin > > password = F35sXAYW5qDlMGfQbhmexIx12DqrQdpw6ixAseTd > > cafile = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > memcached_servers = 192.168.10.178:11211,192.168.10.179:11211,192.168.10.28:11211 > > #insecure = true > > > > > > [glance] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [neutron] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [cinder] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [nova] > > ca_certificates_file = /etc/ssl/private/haproxy.pem > > region_name = Hanoi > > endpoint_type = internal > > insecure = false > > > > [oslo_policy] > > #policy_file = /etc/octavia/policy.json > > > > [oslo_messaging_notifications] > > transport_url = rabbit://openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.178:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.179:5672,openstack:4ychZAT5VrWlk6KFfgAmpXvGdzfdV8hEpIgOLhyF at 192.168.10.28:5672 > > > > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community From Istvan.Szabo at agoda.com Thu Aug 27 03:45:02 2020 From: Istvan.Szabo at agoda.com (Szabo, Istvan (Agoda)) Date: Thu, 27 Aug 2020 03:45:02 +0000 Subject: DB Prune In-Reply-To: References: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> Message-ID: <188b52b4a07b41cab38858c1ae61a7fa@SG-AGMBX-6002.agoda.local> Thank you guys, can do this online or need any outage? -----Original Message----- From: Balázs Gibizer Sent: Wednesday, August 26, 2020 7:15 PM To: Szabo, Istvan (Agoda) Cc: openstack-discuss at lists.openstack.org Subject: Re: DB Prune Email received from outside the company. If in doubt don't click links nor open attachments! ________________________________ On Wed, Aug 26, 2020 at 06:57, "Szabo, Istvan (Agoda)" wrote: > Hi, > > We have a cluster where the user continuously spawn and delete > servers which makes the db even in compressed state 1.1GB. > I’m sure it has a huge amount of trash because this is a cicd > environment and the prod just uses 75MB. > How is it possible to cleanup the db on a safe way, what should be > the steps? > From Nova perspective you can get rid of the data of the already deleted instances via the following two commands: nova-manage db archive_deleted_rows nova-manage db purge Cheers, gibi [1]https://docs.openstack.org/nova/latest/cli/nova-manage.html > > Best regards, > Istvan > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by > copyright or other legal rules. If you have received it by mistake > please let us know by reply email and delete it from your system. It > is prohibited to copy this message or disclose its content to anyone. > Any confidentiality or privilege is not waived or lost by any mistaken > delivery or unauthorized disclosure of the message. All messages sent > to and from Agoda may be monitored to ensure compliance with company > policies, to protect the company's interests and to remove potential > malware. Electronic messages may be intercepted, amended, lost or > deleted, or contain viruses. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. From elfosardo at gmail.com Thu Aug 27 07:52:23 2020 From: elfosardo at gmail.com (Riccardo Pittau) Date: Thu, 27 Aug 2020 09:52:23 +0200 Subject: [ironic] next Victoria meetup In-Reply-To: References: Message-ID: Hello everyone! Thanks to all who cast their vote, after looking at the results I'm happy to announce that the next Ironic Virtual Meetup will be held on: - Monday August 31st, at 1300 UTC until 1500 UTC - Tuesday September 1st, at 1300 UTC until 1500 UTC For latest news and topics, consult the etherpad at https://etherpad.opendev.org/p/Ironic-Victoria-midcycle Can't wait to see to you all at the Meetup, even if just virtually :) A si biri Riccardo On Thu, Aug 20, 2020 at 7:05 PM Riccardo Pittau wrote: > Hello again! > > Friendly reminder about the vote to schedule the next Ironic Virtual > Meetup! > Since a lot of people are on vacation in this period, we've decided to > postpone the final day for the vote to next Wednesday August 26 > > And we have an etherpad now! > https://etherpad.opendev.org/p/Ironic-Victoria-midcycle > Feel free to propose topics, we'll discuss also about the upcoming PTG and > Forum. > > Thanks! > > A si biri > > Riccardo > > > On Mon, Aug 17, 2020 at 6:29 PM Riccardo Pittau > wrote: > >> Hello everyone! >> >> The time for the next Ironic virtual meetup is close! >> It will be an opportunity to review what has been done in the last >> months, exchange ideas and plan for the time before the upcoming victoria >> release, with an eye towards the future. >> >> We're aiming to have the virtual meetup the first week of September >> (Monday August 31 - Friday September 4) and split it in two days, with one >> two-hours slot per day. >> Please vote for your best time slots here: >> https://doodle.com/poll/pi4x3kuxamf4nnpu >> >> We're planning to leave the vote open at least for the entire week until >> Friday August 21, so to have enough time to announce the final slots and >> planning early next week. >> >> Thanks! >> >> A si biri >> >> Riccardo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Thu Aug 27 08:03:49 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Thu, 27 Aug 2020 10:03:49 +0200 Subject: DB Prune In-Reply-To: <188b52b4a07b41cab38858c1ae61a7fa@SG-AGMBX-6002.agoda.local> References: <859fb3c996514c2ead3fc1ce3de4210b@SG-AGMBX-6002.agoda.local> <188b52b4a07b41cab38858c1ae61a7fa@SG-AGMBX-6002.agoda.local> Message-ID: On Thu, Aug 27, 2020 at 03:45, "Szabo, Istvan (Agoda)" wrote: > Thank you guys, can do this online or need any outage? I think it is safe to run these commands while the nova services are up. However if you have a lot of data to move and then delete that can cause extra DB load. Cheers, gibi > > -----Original Message----- > From: Balázs Gibizer > Sent: Wednesday, August 26, 2020 7:15 PM > To: Szabo, Istvan (Agoda) > Cc: openstack-discuss at lists.openstack.org > Subject: Re: DB Prune > > Email received from outside the company. If in doubt don't click > links nor open attachments! > ________________________________ > > On Wed, Aug 26, 2020 at 06:57, "Szabo, Istvan (Agoda)" > wrote: >> Hi, >> >> We have a cluster where the user continuously spawn and delete >> servers which makes the db even in compressed state 1.1GB. >> I’m sure it has a huge amount of trash because this is a cicd >> environment and the prod just uses 75MB. >> How is it possible to cleanup the db on a safe way, what should be >> the steps? >> > > From Nova perspective you can get rid of the data of the already > deleted instances via the following two commands: > > nova-manage db archive_deleted_rows > nova-manage db purge > > Cheers, > gibi > > [1]https://docs.openstack.org/nova/latest/cli/nova-manage.html > > >> >> Best regards, >> Istvan >> >> >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by >> copyright or other legal rules. If you have received it by mistake >> please let us know by reply email and delete it from your system. It >> is prohibited to copy this message or disclose its content to >> anyone. >> Any confidentiality or privilege is not waived or lost by any >> mistaken >> delivery or unauthorized disclosure of the message. All messages >> sent >> to and from Agoda may be monitored to ensure compliance with company >> policies, to protect the company's interests and to remove potential >> malware. Electronic messages may be intercepted, amended, lost or >> deleted, or contain viruses. > > > > ________________________________ > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by > copyright or other legal rules. If you have received it by mistake > please let us know by reply email and delete it from your system. It > is prohibited to copy this message or disclose its content to anyone. > Any confidentiality or privilege is not waived or lost by any > mistaken delivery or unauthorized disclosure of the message. All > messages sent to and from Agoda may be monitored to ensure compliance > with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, > amended, lost or deleted, or contain viruses. From mark at stackhpc.com Thu Aug 27 08:08:44 2020 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 27 Aug 2020 09:08:44 +0100 Subject: [kolla] Focal upgrade Message-ID: Hi, For the Victoria release we will be moving our Ubuntu support from Bionic 18.04 to the Focal 20.04 LTS release. This applies to both the base container image and host OS. We would like to request feedback from any Ubuntu users about how they typically deal with a distro upgrade like this. I would assume that the following workflow would be used: 1. start with a Ussuri release on Bionic 2. distro upgrade to Focal 3. OpenStack upgrade to Victoria However, that would imply that it would not be possible to make any more changes to the Ussuri deploy after the Focal upgrade, since Kolla Ansible Ussuri release does not support Focal (it is blocked by prechecks). An alternative approach is: 1. start with a Ussuri release on Bionic 2. OpenStack upgrade to Victoria 3. distro upgrade to Focal This implies that Victoria must support both Bionic and Focal as a host OS, which it currently does. This flow matches more closely what we are currently testing in CI (steps 1 and 2 only). In both cases, Victoria container images are based on Focal. Feedback on this would be appreciated. Thanks, Mark From ssbarnea at redhat.com Thu Aug 27 08:11:45 2020 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Thu, 27 Aug 2020 09:11:45 +0100 Subject: Do you want to render ANSI in Zuul console? Message-ID: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> At this moment Zuul web interfaces displays output of commands as raw, so any ANSI terminal output will display ugly artifacts. I tried enabling ANSI about half a year ago but even after providing two different implementations, I was not able to popularize it enough. As this is a UX related feature, I think would like more appropriate to ask for feedback from openstack-discuss, likely the biggest consumer of zuul web interface. Please comment/+/- on review below even if you are not a zuul core. At least it should show if this is a desired feature to have or not: https://review.opendev.org/#/c/739444/ ✅ This review also includes a screenshot that shows how the rendering looks (an alternative for using the sitepreview) Thanks Sorin Sbarnea From xin-ran.wang at intel.com Thu Aug 27 09:50:17 2020 From: xin-ran.wang at intel.com (Wang, Xin-ran) Date: Thu, 27 Aug 2020 09:50:17 +0000 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver In-Reply-To: References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> Message-ID: Hi all, According to our discussion on PTG and recent discussion by mailing list. We have an agreement on using wiki to store the test report for the device drivers in the case that they do not have 3rd Party CI at present. Please see the wiki page here: https://wiki.openstack.org/wiki/Cyborg/TestReport. Currently, there is one test report, other contributor who wants to upstream a device driver in Cyborg and who do not have the condition to hold a 3rd party CI can refer to this test report and give us your report when upstreaming. Reference: https://wiki.openstack.org/wiki/Cyborg/TestReport/IntelQAT Thanks, Xin-Ran -----Original Message----- From: Brin Zhang(张百林) Sent: Saturday, July 11, 2020 9:42 AM To: smooney at redhat.com; yumeng_bao at yahoo.com; openstack-discuss at lists.openstack.org Subject: 答复: [cyborg] Temporary treatment plan for the 3rd-party driver On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers > > (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, > > we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, > > this method is recommended; 2. If there is no CI environment, please > > provide the test results of this driver in the master branch or in > > the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. > > Providing test result can be our option. The test result can be part > of the driver documentation[0] as this is public to users. > And from my understanding, the test result should work as the role of > tempest case and clarify at least: necessary configuration,test operations and test results. > i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. Good advice, this is also my original intention. Give the result verification in the submitted commit, and do not put the test verification result in the code base. As you said, this does not mean that it will always work unless a test report can be provided regularly. Of course, it is better if there is a third-party CI , we will try our best to fight for it. > maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. > the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] > https://docs.openstack.org/cyborg/latest/reference/support-matrix.html > #driver-support > > > > [1] > > http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openst > > ack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From zhangbailin at inspur.com Thu Aug 27 11:19:20 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Thu, 27 Aug 2020 11:19:20 +0000 Subject: =?utf-8?B?562U5aSNOiBbY3lib3JnXSBUZW1wb3JhcnkgdHJlYXRtZW50IHBsYW4gZm9y?= =?utf-8?Q?_the_3rd-party_driver?= References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> Message-ID: <8d43c413b4564e1c9d5ad67e53dbd5a3@inspur.com> Hi all. In today's IRC meeting [1], we decide to have a wiki to maintain the 3-rd-party drivers temporary test results, like Intel QAT driver test result [2], and we also need to maintain the Driver Support docs [3], add "Temporary Test Result" as a column in the Driver Support list, we should mark the result added time, such as the QAT driver result, may we can say "This test results reported at Aug. 2020 in Victoria Release, please reference https://wiki.openstack.org/wiki/Cyborg/TestReport/IntelQAT". In the Driver Support part, we will claim the "Temporary Test Result" is a temporary result, it will not always work. If you encounter problems during the adaptation process, please contact the Cyborg Core Team [4] for help. [1] http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2020-08-27.log.html#t2020-08-27T03:11:40 [2] https://wiki.openstack.org/wiki/Cyborg/TestReport/IntelQAT [3] https://docs.openstack.org/cyborg/latest/reference/support-matrix.html#driver-support [4] https://review.opendev.org/#/admin/groups/1243,members brinzhang -----邮件原件----- 发件人: Brin Zhang(张百林) 发送时间: 2020年7月11日 9:41 收件人: 'smooney at redhat.com' ; 'yumeng_bao at yahoo.com' ; 'openstack-discuss at lists.openstack.org' 主题: 答复: [cyborg] Temporary treatment plan for the 3rd-party driver On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers > > (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, > > we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, > > this method is recommended; 2. If there is no CI environment, please > > provide the test results of this driver in the master branch or in > > the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. > > Providing test result can be our option. The test result can be part > of the driver documentation[0] as this is public to users. > And from my understanding, the test result should work as the role of > tempest case and clarify at least: necessary configuration,test operations and test results. > i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. Good advice, this is also my original intention. Give the result verification in the submitted commit, and do not put the test verification result in the code base. As you said, this does not mean that it will always work unless a test report can be provided regularly. Of course, it is better if there is a third-party CI , we will try our best to fight for it. > maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. > the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] > https://docs.openstack.org/cyborg/latest/reference/support-matrix.html > #driver-support > > > > [1] > > http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openst > > ack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From CAPSEY at augusta.edu Thu Aug 27 14:37:40 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Thu, 27 Aug 2020 14:37:40 +0000 Subject: [neutron][ovn] OVN Performance Message-ID: All, I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and linuxbridge on identical hardware over the past year or so in a few different environments[1]. I know that example is unscientific, but similar results have been borne out in many different scenarios from what we have observed. There are three main problems from what we see: 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN does not really horizontally scale by adding additional API endpoints, from what we have observed. 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the same hardware (and we think that linuxbridge can go further with improved provider network design in that particular case). 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over (probably causes 1+2) It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new OVN deployment scales 40% as well as their ancient linuxbridge-based one? If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to contribute to doing so? We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. [1] https://pastebin.com/kyyURTJm [2] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb [3] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron [4] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute Chris Apsey GEORGIA CYBER CENTER -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Aug 27 15:10:30 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Aug 2020 16:10:30 +0100 Subject: [neutron][ovn] OVN Performance In-Reply-To: References: Message-ID: <29d70bbb4eeb330c435ae600d14aa8cfd627d696.camel@redhat.com> On Thu, 2020-08-27 at 14:37 +0000, Apsey, Christopher wrote: > All, > > I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default > configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and > linuxbridge on identical hardware over the past year or so in a few different environments[1]. the default backend in the docs is not linux bridge right now is it. i tought i has been ml2/ovs for many years. > I know that example is unscientific, but similar results have been borne out in many different scenarios from what > we have observed. There are three main problems from what we see: > > > 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent > capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN > does not really horizontally scale by adding additional API endpoints, from what we have observed. > > 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 > instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for > booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the > same hardware (and we think that linuxbridge can go further with improved provider network design in that particular > case). > > 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over > (probably causes 1+2) > > It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question > becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new OVN > deployment scales 40% as well as their ancient linuxbridge-based one? > > If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to > contribute to doing so? > > We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. > > [1] https://pastebin.com/kyyURTJm > [2] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb > [3] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron > [4] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute > > Chris Apsey > GEORGIA CYBER CENTER > From amuller at redhat.com Thu Aug 27 15:17:45 2020 From: amuller at redhat.com (Assaf Muller) Date: Thu, 27 Aug 2020 11:17:45 -0400 Subject: [neutron][ovn] OVN Performance In-Reply-To: References: Message-ID: The most efficient way about this is to give one or more of the Engineers working on OpenStack OVN upstream (I've added a few to this thread) temporary access to an environment that can reproduce issues you're seeing, we could then document the issues and work towards solutions. If that's not possible, if you could provide reproducer scripts, or alternatively sharpen the reproduction method, we'll take a look. What you've described is not something that's 'acceptable', OVN should definitely not scale worse than Neutron with the Linux Bridge agent. It's possible that the particular issues you ran in to is something that we've already seen internally at Red Hat, or with our customers, and we're already working on fixes in future versions of OVN - I can't tell you until you elaborate on the details of the issues you're seeing. In any case, the upstream community is committed to improving OVN scale and fixing scale issues as they pop up. Coincidentally, Red Hat scale engineers just published an article [1] about work they've done to scale RH-OSP 16.1 (== OpenStack Train on CentOS 8, with OVN 2.13 and TripleO) to 700 compute nodes. [1] https://www.redhat.com/en/blog/scaling-red-hat-openstack-platform-161-more-700-nodes?source=bloglisting On Thu, Aug 27, 2020 at 10:44 AM Apsey, Christopher wrote: > > All, > > > > I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and linuxbridge on identical hardware over the past year or so in a few different environments[1]. I know that example is unscientific, but similar results have been borne out in many different scenarios from what we have observed. There are three main problems from what we see: > > > > 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN does not really horizontally scale by adding additional API endpoints, from what we have observed. > > 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the same hardware (and we think that linuxbridge can go further with improved provider network design in that particular case). > > 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over (probably causes 1+2) > > > > It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new OVN deployment scales 40% as well as their ancient linuxbridge-based one? > > > > If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to contribute to doing so? > > > > We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. > > > > [1] https://pastebin.com/kyyURTJm > > [2] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb > > [3] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron > > [4] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute > > > > Chris Apsey > > GEORGIA CYBER CENTER > > From CAPSEY at augusta.edu Thu Aug 27 15:20:04 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Thu, 27 Aug 2020 15:20:04 +0000 Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance In-Reply-To: <29d70bbb4eeb330c435ae600d14aa8cfd627d696.camel@redhat.com> References: <29d70bbb4eeb330c435ae600d14aa8cfd627d696.camel@redhat.com> Message-ID: > the default backend in the docs is not linux bridge right now is it. > i tought i has been ml2/ovs for many years. Nope – still defaults to linuxbridge on master - https://docs.openstack.org/neutron/latest/install/controller-install-rdo.html. And I don’t think that’s necessarily a bad thing if it’s the simplest option to get working well at the moment, but if the future is OVN, OVN should be at least as good in all respects. Chris Apsey GEORGIA CYBER CENTER From: Sean Mooney Sent: Thursday, August 27, 2020 11:11 AM To: Apsey, Christopher ; openstack-discuss at lists.openstack.org Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance CAUTION: EXTERNAL SENDER This email originated from an external source. Please exercise caution before opening attachments, clicking links, replying, or providing information to the sender. If you believe it to be fraudulent, contact the AU Cybersecurity Hotline at 72-CYBER (2-9237 / 706-722-9237) or 72CYBER at augusta.edu On Thu, 2020-08-27 at 14:37 +0000, Apsey, Christopher wrote: > All, > > I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default > configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and > linuxbridge on identical hardware over the past year or so in a few different environments[1]. the default backend in the docs is not linux bridge right now is it. i tought i has been ml2/ovs for many years. > I know that example is unscientific, but similar results have been borne out in many different scenarios from what > we have observed. There are three main problems from what we see: > > > 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent > capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN > does not really horizontally scale by adding additional API endpoints, from what we have observed. > > 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 > instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for > booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the > same hardware (and we think that linuxbridge can go further with improved provider network design in that particular > case). > > 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over > (probably causes 1+2) > > It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question > becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new OVN > deployment scales 40% as well as their ancient linuxbridge-based one? > > If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to > contribute to doing so? > > We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. > > [1] https://pastebin.com/kyyURTJm > [2] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb > [3] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron > [4] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute > > Chris Apsey > GEORGIA CYBER CENTER > -------------- next part -------------- An HTML attachment was scrubbed... URL: From CAPSEY at augusta.edu Thu Aug 27 15:32:37 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Thu, 27 Aug 2020 15:32:37 +0000 Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance In-Reply-To: References: Message-ID: Assaf, We can absolutely support engineering poking around in our environment (and possibly an even larger one at my previous employer that was experiencing similar issues during testing). We can take this offline so we don’t spam the mailing list. Just let me know how to proceed, Thanks! Chris Apsey GEORGIA CYBER CENTER From: Assaf Muller Sent: Thursday, August 27, 2020 11:18 AM To: Apsey, Christopher Cc: openstack-discuss at lists.openstack.org; Lucas Alvares Gomes Martins ; Jakub Libosvar ; Daniel Alvarez Sanchez Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance CAUTION: EXTERNAL SENDER This email originated from an external source. Please exercise caution before opening attachments, clicking links, replying, or providing information to the sender. If you believe it to be fraudulent, contact the AU Cybersecurity Hotline at 72-CYBER (2-9237 / 706-722-9237) or 72CYBER at augusta.edu The most efficient way about this is to give one or more of the Engineers working on OpenStack OVN upstream (I've added a few to this thread) temporary access to an environment that can reproduce issues you're seeing, we could then document the issues and work towards solutions. If that's not possible, if you could provide reproducer scripts, or alternatively sharpen the reproduction method, we'll take a look. What you've described is not something that's 'acceptable', OVN should definitely not scale worse than Neutron with the Linux Bridge agent. It's possible that the particular issues you ran in to is something that we've already seen internally at Red Hat, or with our customers, and we're already working on fixes in future versions of OVN - I can't tell you until you elaborate on the details of the issues you're seeing. In any case, the upstream community is committed to improving OVN scale and fixing scale issues as they pop up. Coincidentally, Red Hat scale engineers just published an article [1] about work they've done to scale RH-OSP 16.1 (== OpenStack Train on CentOS 8, with OVN 2.13 and TripleO) to 700 compute nodes. [1] https://www.redhat.com/en/blog/scaling-red-hat-openstack-platform-161-more-700-nodes?source=bloglisting On Thu, Aug 27, 2020 at 10:44 AM Apsey, Christopher > wrote: > > All, > > > > I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and linuxbridge on identical hardware over the past year or so in a few different environments[1]. I know that example is unscientific, but similar results have been borne out in many different scenarios from what we have observed. There are three main problems from what we see: > > > > 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN does not really horizontally scale by adding additional API endpoints, from what we have observed. > > 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the same hardware (and we think that linuxbridge can go further with improved provider network design in that particular case). > > 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over (probably causes 1+2) > > > > It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new OVN deployment scales 40% as well as their ancient linuxbridge-based one? > > > > If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to contribute to doing so? > > > > We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. > > > > [1] https://pastebin.com/kyyURTJm > > [2] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb > > [3] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron > > [4] https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute > > > > Chris Apsey > > GEORGIA CYBER CENTER > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Aug 27 15:37:28 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 27 Aug 2020 08:37:28 -0700 Subject: Do you want to render ANSI in Zuul console? In-Reply-To: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> References: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> Message-ID: On Thu, Aug 27, 2020, at 1:11 AM, Sorin Sbarnea wrote: > At this moment Zuul web interfaces displays output of commands as raw, > so any ANSI terminal output will display ugly artifacts. > > I tried enabling ANSI about half a year ago but even after providing > two different implementations, I was not able to popularize it enough. > > > As this is a UX related feature, I think would like more appropriate to > ask for feedback from openstack-discuss, likely the biggest consumer of > zuul web interface. > > Please comment/+/- on review below even if you are not a zuul core. At > least it should show if this is a desired feature to have or not: Without my Zuul hat on but with my "I debug a lot of openstack jobs" hat I would prefer we remove ansi color controls from our log files entirely. They make using grep and other machine processing tools more difficult. I find the utility of grep, ^F, elasticsearch, and the log level severity filtering far more useful than scrolling and looking for colors that may be arbitrarily applied by the source. > > https://review.opendev.org/#/c/739444/ ✅ > > This review also includes a screenshot that shows how the rendering > looks (an alternative for using the sitepreview) > > Thanks > Sorin Sbarnea > > > From smooney at redhat.com Thu Aug 27 16:22:38 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Aug 2020 17:22:38 +0100 Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance In-Reply-To: References: <29d70bbb4eeb330c435ae600d14aa8cfd627d696.camel@redhat.com> Message-ID: <8ea879bb62bf6ec8f398b9036cbb105e5bd9ff64.camel@redhat.com> On Thu, 2020-08-27 at 15:20 +0000, Apsey, Christopher wrote: > > the default backend in the docs is not linux bridge right now is it. > > i tought i has been ml2/ovs for many years. > > Nope – still defaults to linuxbridge on master - > https://docs.openstack.org/neutron/latest/install/controller-install-rdo.html. > its not the default we use in devstack so its got much less testign then ovs so im surpised to see our docs decaulting to it. im not sure if any openstack installer default to linux bridge and i know we have had trouble in the past maintaining it when bugs arise. so its simple yes but not the best maintained or developed driver. i would be concerned that new people that deploy would hit bugs and not be able to find support on irc or the mailinglist but i guss that is more of a first contact problem then related to your ovn issue.. > And I don’t think that’s necessarily a bad thing if it’s the simplest option to get working well at the moment, but if > the future is OVN, OVN should be at least as good in all respects. > > Chris Apsey > GEORGIA CYBER CENTER > > From: Sean Mooney > Sent: Thursday, August 27, 2020 11:11 AM > To: Apsey, Christopher ; openstack-discuss at lists.openstack.org > Subject: [EXTERNAL] Re: [neutron][ovn] OVN Performance > > CAUTION: EXTERNAL SENDER This email originated from an external source. Please exercise caution before opening > attachments, clicking links, replying, or providing information to the sender. If you believe it to be fraudulent, > contact the AU Cybersecurity Hotline at 72-CYBER (2-9237 / 706-722-9237) or 72CYBER at augusta.edu 72CYBER at augusta.edu> > > On Thu, 2020-08-27 at 14:37 +0000, Apsey, Christopher wrote: > > All, > > > > I know that OVN is going to become the default neutron backend at some point and displace linuxbridge as the default > > configuration option in the docs, but we have noticed a pretty significant performance disparity between OVN and > > linuxbridge on identical hardware over the past year or so in a few different environments[1]. > > the default backend in the docs is not linux bridge right now is it. > i tought i has been ml2/ovs for many years. > > I know that example is unscientific, but similar results have been borne out in many different scenarios from what > > we have observed. There are three main problems from what we see: > > > > > > 1. OVN does not handle large concurrent requests as well as linuxbridge. Additionally, linuxbridge concurrent > > capacity grows (not linearly, but grows nonetheless) by adding additional neutron API endpoints and RPC agents. OVN > > does not really horizontally scale by adding additional API endpoints, from what we have observed. > > > > 2. OVN gets significantly slower as load on the system grows. We have observed a soft cap of about 2000-2500 > > instances in a given deployment before ovn-backed neutron stops responding altogether to nova requests (even for > > booting a single instance). We have observed linuxbridge get to 5000+ instances before it starts to struggle on the > > same hardware (and we think that linuxbridge can go further with improved provider network design in that particular > > case). > > > > 3. Once the southbound database process hits 100% CPU usage on the leader in the ovn cluster, it’s game over > > (probably causes 1+2) > > > > It's entirely possible that we just don’t understand OVN well enough to tune it [2][3][4], but then the question > > becomes how do we get that tuning knowledge into the docs so people don’t scratch their heads when their cool new > > OVN > > deployment scales 40% as well as their ancient linuxbridge-based one? > > > > If it is ‘known’ that OVN has some scaling challenges, is there a plan to fix it, and what is the best way to > > contribute to doing so? > > > > We have observed similar results on Ubuntu 18.04/20.04 and CentOS 7/8 on Stein, Train, and Ussuri. > > > > [1] https://pastebin.com/kyyURTJm; > > [2] > > https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/ovsdb > > ; > > [3] > > https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/neutron > > ; > > [4] > > https://github.com/GeorgiaCyber/kinetic/tree/master/formulas/compute > > ; > > > > Chris Apsey > > GEORGIA CYBER CENTER > > From smooney at redhat.com Thu Aug 27 16:24:58 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Aug 2020 17:24:58 +0100 Subject: Do you want to render ANSI in Zuul console? In-Reply-To: References: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> Message-ID: <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> On Thu, 2020-08-27 at 08:37 -0700, Clark Boylan wrote: > On Thu, Aug 27, 2020, at 1:11 AM, Sorin Sbarnea wrote: > > At this moment Zuul web interfaces displays output of commands as raw, > > so any ANSI terminal output will display ugly artifacts. > > > > I tried enabling ANSI about half a year ago but even after providing > > two different implementations, I was not able to popularize it enough. > > > > > > As this is a UX related feature, I think would like more appropriate to > > ask for feedback from openstack-discuss, likely the biggest consumer of > > zuul web interface. > > > > Please comment/+/- on review below even if you are not a zuul core. At > > least it should show if this is a desired feature to have or not: > > Without my Zuul hat on but with my "I debug a lot of openstack jobs" hat I would prefer we remove ansi color controls > from our log files entirely. They make using grep and other machine processing tools more difficult. I find the > utility of grep, ^F, elasticsearch, and the log level severity filtering far more useful than scrolling and looking > for colors that may be arbitrarily applied by the source. if we can remove them form the logs but use a javascpit lib in the viewer to still highlight thing that might be the best of both worlds i do fine the syntax hyilighign nice but we dont need color codes to do that. > > > > > https://review.opendev.org/#/c/739444/ ✅ > > > > This review also includes a screenshot that shows how the rendering > > looks (an alternative for using the sitepreview) > > > > Thanks > > Sorin Sbarnea > > > > > > > > From smooney at redhat.com Thu Aug 27 16:26:04 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 27 Aug 2020 17:26:04 +0100 Subject: Do you want to render ANSI in Zuul console? In-Reply-To: <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> References: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> Message-ID: On Thu, 2020-08-27 at 17:24 +0100, Sean Mooney wrote: > On Thu, 2020-08-27 at 08:37 -0700, Clark Boylan wrote: > > On Thu, Aug 27, 2020, at 1:11 AM, Sorin Sbarnea wrote: > > > At this moment Zuul web interfaces displays output of commands as raw, > > > so any ANSI terminal output will display ugly artifacts. > > > > > > I tried enabling ANSI about half a year ago but even after providing > > > two different implementations, I was not able to popularize it enough. > > > > > > > > > As this is a UX related feature, I think would like more appropriate to > > > ask for feedback from openstack-discuss, likely the biggest consumer of > > > zuul web interface. > > > > > > Please comment/+/- on review below even if you are not a zuul core. At > > > least it should show if this is a desired feature to have or not: > > > > Without my Zuul hat on but with my "I debug a lot of openstack jobs" hat I would prefer we remove ansi color > > controls > > from our log files entirely. They make using grep and other machine processing tools more difficult. I find the > > utility of grep, ^F, elasticsearch, and the log level severity filtering far more useful than scrolling and looking > > for colors that may be arbitrarily applied by the source. > > if we can remove them form the logs but use a javascpit lib in the viewer to still highlight thing that might be the > best of both worlds > i do fine the syntax hyilighign nice but we dont need color codes to do that. i ment to say i have had some success with https://highlightjs.org/ before for that use case mainly in blogs but it might be a solution. > > > > > > > > https://review.opendev.org/#/c/739444/ ✅ > > > > > > This review also includes a screenshot that shows how the rendering > > > looks (an alternative for using the sitepreview) > > > > > > Thanks > > > Sorin Sbarnea > > > > > > > > > > > > > > > From sean.mcginnis at gmx.com Thu Aug 27 17:06:51 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 27 Aug 2020 12:06:51 -0500 Subject: [ops] Restructuring OSOPS tools Message-ID: Hello everyone, We recently expanded the scope of the Ops Docs SIG to also include any ops tooling. I think it's now time to move on to the next step of actually getting some of the old tooling in place and organized how we want it. We have several semi-abandoned repos from back when there was more work being done on ops tooling. During the great rebranding, those all were moved under the x/ namespace: https://opendev.org/x/?tab=&sort=recentupdate&q=osops Since these are now owned by an official SIG, we can move this content back under the openstack/ namespace. That should help increase visibility somewhat, and make things look a little more official. It will also allow contributors to tooling to get recognition for contributing to an import part of the OpenStack ecosystem. I do think it's can be a little more difficult to find things spread out over several repos though. For simplicity with finding tooling, as well as watching for reviews and helping with overall maintenance, I would like to move all of these under a common openstack/osops. Under that repo, we can then have a folder structure with tools/logging, tools/monitoring, etc. Then with everything in one place, we can have docs published in one place that helps find everything and easily links between tools. We can also capture some metadata about the tools, and use that to reflect their state in those docs. Please let me know if there are any objects to this plan. Otherwise, I will start cleaning things up and getting it staged in a new repo to be imported as an official repo owned by the SIG. Thanks! Sean From sean.mcginnis at gmx.com Thu Aug 27 17:10:08 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 27 Aug 2020 12:10:08 -0500 Subject: [all] Wallaby Release Schedule Message-ID: <68e239af-115b-3fb9-96f4-fc10130b90fe@gmx.com> Hey everyone, We have officially published the schedule for the Wallaby development cycle. That can now be found on the release.openstack.org site here: https://releases.openstack.org/wallaby/schedule.html PTLs, feel free to propose any updates if there are important project-specific deadlines you would like to include on the schedule. Just ping me if you need any examples of how that is done. Thanks! Sean From akekane at redhat.com Thu Aug 27 17:16:29 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 27 Aug 2020 22:46:29 +0530 Subject: [all] Wallaby Release Schedule In-Reply-To: <68e239af-115b-3fb9-96f4-fc10130b90fe@gmx.com> References: <68e239af-115b-3fb9-96f4-fc10130b90fe@gmx.com> Message-ID: Hi Sean, I think PTG dates are not highlighted, does it need to be highlighted? Thanks & Best Regards, Abhishek Kekane On Thu, Aug 27, 2020 at 10:43 PM Sean McGinnis wrote: > Hey everyone, > > We have officially published the schedule for the Wallaby development > cycle. That can now be found on the release.openstack.org site here: > > https://releases.openstack.org/wallaby/schedule.html > > PTLs, feel free to propose any updates if there are important > project-specific deadlines you would like to include on the schedule. > Just ping me if you need any examples of how that is done. > > Thanks! > > Sean > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From its-openstack at zohocorp.com Thu Aug 27 17:47:35 2020 From: its-openstack at zohocorp.com (its-openstack at zohocorp.com) Date: Thu, 27 Aug 2020 23:17:35 +0530 Subject: per user quota not applign in openstack train Message-ID: <17431083602.fe3e34d15305.5471067663006187936@zohocorp.com> Dear openstack, We are facing a peculiar issue with regards to users quota of resources. e.g: +------------------------------------------------------------------------------------------------------+ | project |   user  |  instance quota            |  no: of instance created      | | -----------|------------|-----------------------------------|------------------------------------------| |  tes      |     -      |      10                            |             -                               | |  test     |  user1 |      2                              |            2                               | |  test     |  user2 |      2                              |      error "quota over"          | |  test     |  user3 |      3                              |      only 1 instance allowed  | |  test     |  user4 | no user quota defined  |    able to create 10 instance| +-------------------------------------------------------------------------------------------------------+ As you see from mentioned table. when user1,user2, has instance quota of 2 and when user1 has created 2 instance, user2 unable to create instance. but user3 able to create only 1 more instance, user 4 has no quota applied so project quota 10 will be applied and he can create 10 instance. the quota is applied to each user but not tracked for each user, so this defeats the purpose of per user quota. Please help us with resolving this issue.     Regards, sysadmin team -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Aug 27 17:50:25 2020 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 27 Aug 2020 13:50:25 -0400 Subject: senlin auto scaling question Message-ID: Folks, I have created very simple cluster using following command openstack cluster create --profile myserver --desired-capacity 2 --min-size 2 --max-size 3 --strict my-asg It spun up 2 vm immediately now because the desired capacity is 2 so I am assuming if any node dies in the cluster it should spin up node to make count 2 right? so i killed one of node with "nove delete " but senlin didn't create node automatically to make desired capacity 2 (In AWS when you kill node in ASG it will create new node so is this senlin different then AWS?) From sean.mcginnis at gmx.com Thu Aug 27 18:19:11 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 27 Aug 2020 13:19:11 -0500 Subject: [all] Wallaby Release Schedule In-Reply-To: References: <68e239af-115b-3fb9-96f4-fc10130b90fe@gmx.com> Message-ID: <7494edec-c853-c8d1-4f6d-c076a97f4ed8@gmx.com> On 8/27/20 12:16 PM, Abhishek Kekane wrote: > Hi Sean, > > I think PTG dates are not highlighted, does it need to be highlighted? > > Thanks & Best Regards, > > Abhishek Kekane Yep, thanks for pointing that out Abhishek. At the time, the PTG dates were not confirmed yet. We do now have that set for October 26-30 now, so I have proposed a patch to update the schedule to reflect that: https://review.opendev.org/748504 Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Thu Aug 27 18:28:10 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 27 Aug 2020 14:28:10 -0400 Subject: [tc] Monthly meeting Message-ID: Hi everyone, Our monthly TC meeting is scheduled for next Thursday, September 3rd, at 1400 UTC. If you would like to add topics for discussion, please go to https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting and fill out your suggestions by Wednesday, September 2nd, at 1900 UTC. Thank you, Regards, Mohammed -- Mohammed Naser VEXXHOST, Inc. From arunkumar.palanisamy at tcs.com Wed Aug 26 19:02:22 2020 From: arunkumar.palanisamy at tcs.com (ARUNKUMAR PALANISAMY) Date: Wed, 26 Aug 2020 19:02:22 +0000 Subject: Trove images for Cluster testing. Message-ID: Hello Team, My name is ARUNKUMAR PALANISAMY, As part of our project requirement, we are evaluating trove components and need your support for experimental datastore Image for testing cluster. (Redis, Cassandra, MongoDB, Couchbase) 1.) We are running devstack enviorment with Victoria Openstack release and with this image (trove-master-guest-ubuntu-bionic-dev.qcow2), we are able to deploy mysql instance and and getting below error while creating mongoDB instances. "ModuleNotFoundError: No module named 'trove.guestagent.datastore.experimental' " 2.) While tried creating mongoDB image with diskimage-builder tool, but we are getting "Block device " element error. Regards, Arunkumar Palanisamy Cell: +49 172 6972490 =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From arunkumar.palanisamy at tcs.com Wed Aug 26 19:34:32 2020 From: arunkumar.palanisamy at tcs.com (ARUNKUMAR PALANISAMY) Date: Wed, 26 Aug 2020 19:34:32 +0000 Subject: openstack-discuss Message-ID: Hello Team, My name is ARUNKUMAR PALANISAMY, As part of our project requirement, we are evaluating trove components and need your support for experimental datastore Image for testing cluster. (Redis, Cassandra, MongoDB, Couchbase) 1.) We are running devstack enviorment with Victoria Openstack release and with this image (trove-master-guest-ubuntu-bionic-dev.qcow2), we are able to deploy mysql instance and and getting below error while creating mongoDB instances. "ModuleNotFoundError: No module named 'trove.guestagent.datastore.experimental' " 2.) While tried creating mongoDB image with diskimage-builder tool, but we are getting "Block device " element error. Regards, Arunkumar Palanisamy =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From nanthini.a.a at ericsson.com Thu Aug 27 11:09:11 2020 From: nanthini.a.a at ericsson.com (NANTHINI A A) Date: Thu, 27 Aug 2020 11:09:11 +0000 Subject: [Heat] Reg Creation of resource based on another resource attribute value Message-ID: Hi Team , I want to create the openstack subnet resource based on the openstack network resource's attribute STATUS value. i.e Create neutron subnet only when the neutron network status is ACTIVE . I can see currently the support of get_Attr function is not there in conditions section .Also the depends_on function accepts input as resource ids only .I cant pass a condition there . Is there any other way to implement the same .Please suggest . Thanks, A.Nanthini -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.bell at cern.ch Thu Aug 27 18:56:23 2020 From: tim.bell at cern.ch (Tim Bell) Date: Thu, 27 Aug 2020 20:56:23 +0200 Subject: per user quota not applign in openstack train In-Reply-To: <17431083602.fe3e34d15305.5471067663006187936@zohocorp.com> References: <17431083602.fe3e34d15305.5471067663006187936@zohocorp.com> Message-ID: <6559ED40-CE58-41A4-98B7-3AB90FF88E8A@cern.ch> > On 27 Aug 2020, at 19:47, its-openstack at zohocorp.com wrote: > > > > Dear openstack, > > We are facing a peculiar issue with regards to users quota of resources. > > e.g: > +------------------------------------------------------------------------------------------------------+ > | project | user | instance quota | no: of instance created | > | -----------|------------|-----------------------------------|------------------------------------------| > | tes | - | 10 | - | > | test | user1 | 2 | 2 | > | test | user2 | 2 | error "quota over" | > | test | user3 | 3 | only 1 instance allowed | > | test | user4 | no user quota defined | able to create 10 instance| > +-------------------------------------------------------------------------------------------------------+ > As you see from mentioned table. when user1,user2, has instance quota of 2 and when user1 has created 2 instance, user2 unable to create instance. > but user3 able to create only 1 more instance, user 4 has no quota applied so project quota 10 will be applied and he can create 10 instance. > > the quota is applied to each user but not tracked for each user, so this defeats the purpose of per user quota. > > Please help us with resolving this issue. > > I had understood that per-user quota was deprecated now. Have you had a look at creating dedicated per-usret projects with assigned quotas ? Tim > Regards, > sysadmin team > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Aug 27 20:09:21 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 27 Aug 2020 22:09:21 +0200 Subject: [neutron] Drivers meeting 28.08.2020 cancelled Message-ID: <20200827200921.6z7pl33zwgnk3caz@skaplons-mac> Hi, There is no any new RFEs in the agenda for tomorrow's drivers team meeting so lets cancel it and see You all next week. Have a great weekend. -- Slawek Kaplonski Principal software engineer Red Hat From tonyliu0592 at hotmail.com Thu Aug 27 20:47:36 2020 From: tonyliu0592 at hotmail.com (Tony Liu) Date: Thu, 27 Aug 2020 20:47:36 +0000 Subject: [Kolla] re-create container Message-ID: Hi, Is Kolla container created by playbook only or there is something like docker-compose to re-create container in case it's deleted after initial deployment? Thanks! Tony From ssbarnea at redhat.com Thu Aug 27 20:56:27 2020 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Thu, 27 Aug 2020 21:56:27 +0100 Subject: Do you want to render ANSI in Zuul console? In-Reply-To: <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> References: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> Message-ID: <381A4B67-E346-4B17-8586-A07DDBCA1F79@redhat.com> This does not make much sense to me as it sounds as: Lets convert all the images to B&W because it takes less space on disk and tell user to use JS based AI to recolor to them. Displaying ANSI does not mean colorize my logs, has nothing to do with it. Displaying ANSI is about respecting the output produced by the executed tools. Zuul should respect the output received on stderr/stdout and display it like a console/ terminal. If the job author decides to use ANSI or not is up to them. Still, Zuul itself as product should just render ANSI content, mainly because I do not see any use-case where someone would want to render that text as RAW, as we all know ANSI escapes do not add any value to the user. Still, if the ability to display raw text, without ansi conversion is a real need, I could spend few more hours to implement it and add a preference option. Still, think twice before asking for a feature that adds some code complexity and may not prove to be of real practical use. We all know that the raw text is still available inside the big json file in case someone has doubs regarding what was rendered may be wrong. > On 27 Aug 2020, at 17:24, Sean Mooney wrote: > > On Thu, 2020-08-27 at 08:37 -0700, Clark Boylan wrote: >> On Thu, Aug 27, 2020, at 1:11 AM, Sorin Sbarnea wrote: >>> At this moment Zuul web interfaces displays output of commands as raw, >>> so any ANSI terminal output will display ugly artifacts. >>> >>> I tried enabling ANSI about half a year ago but even after providing >>> two different implementations, I was not able to popularize it enough. >>> >>> >>> As this is a UX related feature, I think would like more appropriate to >>> ask for feedback from openstack-discuss, likely the biggest consumer of >>> zuul web interface. >>> >>> Please comment/+/- on review below even if you are not a zuul core. At >>> least it should show if this is a desired feature to have or not: >> >> Without my Zuul hat on but with my "I debug a lot of openstack jobs" hat I would prefer we remove ansi color controls >> from our log files entirely. They make using grep and other machine processing tools more difficult. I find the >> utility of grep, ^F, elasticsearch, and the log level severity filtering far more useful than scrolling and looking >> for colors that may be arbitrarily applied by the source. > if we can remove them form the logs but use a javascpit lib in the viewer to still highlight thing that might be the > best of both worlds > i do fine the syntax hyilighign nice but we dont need color codes to do that. >> >>> >>> https://review.opendev.org/#/c/739444/ ✅ >>> >>> This review also includes a screenshot that shows how the rendering >>> looks (an alternative for using the sitepreview) >>> >>> Thanks >>> Sorin Sbarnea -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Aug 27 21:01:01 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 27 Aug 2020 14:01:01 -0700 Subject: Do you want to render ANSI in Zuul console? In-Reply-To: <381A4B67-E346-4B17-8586-A07DDBCA1F79@redhat.com> References: <7AC2A3FE-FAE3-4EA1-BC0F-2B104F0D13CB@redhat.com> <16d3afb3557c4ab745a3b244abb2d94b21c8d149.camel@redhat.com> <381A4B67-E346-4B17-8586-A07DDBCA1F79@redhat.com> Message-ID: <3e4a7cf3-1686-4a16-92c4-c7a52a8d9dfe@www.fastmail.com> On Thu, Aug 27, 2020, at 1:56 PM, Sorin Sbarnea wrote: > This does not make much sense to me as it sounds as: Lets convert all > the images to B&W because it takes less space on disk and tell user to > use JS based AI to recolor to them. > > Displaying ANSI does not mean colorize my logs, has nothing to do with it. > > Displaying ANSI is about respecting the output produced by the executed tools. > > Zuul should respect the output received on stderr/stdout and display it > like a console/ terminal. If the job author decides to use ANSI or not > is up to them. You asked if OpenStack would use/like to use such a feature. I'm suggesting a better option for OpenStack is to avoid adding a bunch of control codes to logs. > > Still, Zuul itself as product should just render ANSI content, mainly > because I do not see any use-case where someone would want to render > that text as RAW, as we all know ANSI escapes do not add any value to > the user. > > Still, if the ability to display raw text, without ansi conversion is > a real need, I could spend few more hours to implement it and add a > preference option. Still, think twice before asking for a feature that > adds some code complexity and may not prove to be of real practical > use. We all know that the raw text is still available inside the big > json file in case someone has doubs regarding what was rendered may be > wrong. > > > On 27 Aug 2020, at 17:24, Sean Mooney wrote: > > > > On Thu, 2020-08-27 at 08:37 -0700, Clark Boylan wrote: > >> On Thu, Aug 27, 2020, at 1:11 AM, Sorin Sbarnea wrote: > >>> At this moment Zuul web interfaces displays output of commands as raw, > >>> so any ANSI terminal output will display ugly artifacts. > >>> > >>> I tried enabling ANSI about half a year ago but even after providing > >>> two different implementations, I was not able to popularize it enough. > >>> > >>> > >>> As this is a UX related feature, I think would like more appropriate to > >>> ask for feedback from openstack-discuss, likely the biggest consumer of > >>> zuul web interface. > >>> > >>> Please comment/+/- on review below even if you are not a zuul core. At > >>> least it should show if this is a desired feature to have or not: > >> > >> Without my Zuul hat on but with my "I debug a lot of openstack jobs" hat I would prefer we remove ansi color controls > >> from our log files entirely. They make using grep and other machine processing tools more difficult. I find the > >> utility of grep, ^F, elasticsearch, and the log level severity filtering far more useful than scrolling and looking > >> for colors that may be arbitrarily applied by the source. > > if we can remove them form the logs but use a javascpit lib in the viewer to still highlight thing that might be the > > best of both worlds > > i do fine the syntax hyilighign nice but we dont need color codes to do that. > >> > >>> > >>> https://review.opendev.org/#/c/739444/ ✅ > >>> > >>> This review also includes a screenshot that shows how the rendering > >>> looks (an alternative for using the sitepreview) > >>> > >>> Thanks > >>> Sorin Sbarnea > From viroel at gmail.com Thu Aug 27 21:49:53 2020 From: viroel at gmail.com (Douglas) Date: Thu, 27 Aug 2020 18:49:53 -0300 Subject: [manila] Victoria Collab Review next Tuesday (Sep 1st) Message-ID: Hi everybody We will have a new edition of our collaborative review next Tuesday, September 1st, where we'll go through the code and review the proposed feature Share Server Migration[1][2]. This meeting is scheduled for two hours, starting at 5:00PM UTC. Meeting notes and videoconference links will be available here[3]. Feel free to attend if you are interested and available. Hoping to see you there, - dviroel [1] https://opendev.org/openstack/manila-specs/src/branch/master/specs/victoria/share-server-migration.rst [2] https://review.opendev.org/#/q/topic:bp/share-server-migration+(status:open) [3] https://etherpad.opendev.org/p/manila-victoria-collab-review -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Thu Aug 27 22:09:10 2020 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 28 Aug 2020 10:09:10 +1200 Subject: Trove images for Cluster testing. In-Reply-To: References: Message-ID: Hi Arunkumar, Unfortunately, for now Trove only supports MySQL and MariaDB, I'm working on adding PostgreSQL support. All other datastores are unmaintained right now. Since this(Victoria) dev cycle, docker container was introduced in Trove guest agent in order to remove the maintenance overhead for multiple Trove guest images. We only need to maintain one single guest image but could support different datastores. We have to do that as such a small Trove team in the community. If supporting Redis, Cassandra, MongoDB or Couchbase is in your feature request, you are welcome to contribute to Trove. Please let me know if you have any other questions. You are also welcome to join #openstack-trove IRC channel for discussion. --- Lingxian Kong Senior Software Engineer Catalyst Cloud www.catalystcloud.nz On Fri, Aug 28, 2020 at 6:45 AM ARUNKUMAR PALANISAMY < arunkumar.palanisamy at tcs.com> wrote: > Hello Team, > > > > My name is ARUNKUMAR PALANISAMY, > > > > As part of our project requirement, we are evaluating trove components and > need your support for experimental datastore Image for testing cluster. > (Redis, Cassandra, MongoDB, Couchbase) > > > > 1.) We are running devstack enviorment with Victoria Openstack release > and with this image (trove-master-guest-ubuntu-bionic-dev.qcow2 > ), > we are able to deploy mysql instance and and getting below error while > creating mongoDB instances. > > > > *“ModuleNotFoundError: No module named > 'trove.guestagent.datastore.experimental' “* > > > > 2.) While tried creating mongoDB image with diskimage-builder > tool, but we are > getting “Block device ” element error. > > > > > > Regards, > > Arunkumar Palanisamy > > Cell: +49 172 6972490 > > > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sorrison at gmail.com Thu Aug 27 22:10:50 2020 From: sorrison at gmail.com (Sam Morrison) Date: Fri, 28 Aug 2020 08:10:50 +1000 Subject: [neutron][networking-midonet] Maintainers needed In-Reply-To: <610412AF-AADF-44BD-ABA2-BA289B7C8F8A@redhat.com> References: <0AC5AC07-E97E-43CC-B344-A3E992B8CCA4@netways.de> <610412AF-AADF-44BD-ABA2-BA289B7C8F8A@redhat.com> Message-ID: <5E2F5826-559E-42E9-84C5-FA708E5A122A@gmail.com> We (Nectar Research Cloud) use midonet heavily too, it works really well and we haven’t found another driver that works for us. We tried OVN but it just doesn’t scale to the size of environment we have. I’m happy to help too. Cheers, Sam > On 31 Jul 2020, at 2:06 am, Slawek Kaplonski wrote: > > Hi, > > Thx Sebastian for stepping in to maintain the project. That is great news. > I think that at the beginning You should do 2 things: > - sync with Takashi Yamamoto (I added him to the loop) as he is probably most active current maintainer of this project, > - focus on fixing networking-midonet ci which is currently broken - all scenario jobs aren’t working fine on Ubuntu 18.04 (and we are going to move to 20.04 in this cycle), migrate jobs to zuulv3 from the legacy ones and finally add them to the ci again, > > I can of course help You with ci jobs if You need any help. Feel free to ping me on IRC or email (can be off the list). > >> On 29 Jul 2020, at 15:24, Sebastian Saemann wrote: >> >> Hi Slawek, >> >> we at NETWAYS are running most of our neutron networking on top of midonet and wouldn't be too happy if it gets deprecated and removed. So we would like to take over the maintainer role for this part. >> >> Please let me know how to proceed and how we can be onboarded easily. >> >> Best regards, >> >> Sebastian >> >> --  >> Sebastian Saemann >> Head of Managed Services >> >> NETWAYS Managed Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg >> Tel: +49 911 92885-0 | Fax: +49 911 92885-77 >> CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB25207 >> https://netways.de | sebastian.saemann at netways.de >> >> ** NETWAYS Web Services - https://nws.netways.de ** > > — > Slawek Kaplonski > Principal software engineer > Red Hat > > From gmann at ghanshyammann.com Thu Aug 27 22:14:57 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 27 Aug 2020 17:14:57 -0500 Subject: [kolla] Focal upgrade In-Reply-To: References: Message-ID: <17431fcfe5e.fab23a5543953.1175910699983452884@ghanshyammann.com> ---- On Thu, 27 Aug 2020 03:08:44 -0500 Mark Goddard wrote ---- > Hi, > > For the Victoria release we will be moving our Ubuntu support from > Bionic 18.04 to the Focal 20.04 LTS release. This applies to both the > base container image and host OS. > > We would like to request feedback from any Ubuntu users about how they > typically deal with a distro upgrade like this. I would assume that > the following workflow would be used: > > 1. start with a Ussuri release on Bionic > 2. distro upgrade to Focal > 3. OpenStack upgrade to Victoria > > However, that would imply that it would not be possible to make any > more changes to the Ussuri deploy after the Focal upgrade, since Kolla > Ansible Ussuri release does not support Focal (it is blocked by > prechecks). > > An alternative approach is: > > 1. start with a Ussuri release on Bionic > 2. OpenStack upgrade to Victoria > 3. distro upgrade to Focal > I am not ubuntu user or not done such upgrade but I think later approach is a better way. Ussuri release for sure will not work on Focal ( by seeing the fixes in various projects for Focal migration.). Victoria still is tested on Bionic so it is surly tested till we move our testing to Focal. -gmann > This implies that Victoria must support both Bionic and Focal as a > host OS, which it currently does. This flow matches more closely what > we are currently testing in CI (steps 1 and 2 only). > > In both cases, Victoria container images are based on Focal. > > Feedback on this would be appreciated. > > Thanks, > Mark > > From melwittt at gmail.com Thu Aug 27 23:51:34 2020 From: melwittt at gmail.com (melanie witt) Date: Thu, 27 Aug 2020 16:51:34 -0700 Subject: per user quota not applign in openstack train In-Reply-To: <6559ED40-CE58-41A4-98B7-3AB90FF88E8A@cern.ch> References: <17431083602.fe3e34d15305.5471067663006187936@zohocorp.com> <6559ED40-CE58-41A4-98B7-3AB90FF88E8A@cern.ch> Message-ID: <5f3f8a59-a41a-2a69-007e-1a4dbf4f0ed7@gmail.com> On 8/27/20 11:56, Tim Bell wrote: > >> On 27 Aug 2020, at 19:47, its-openstack at zohocorp.com >> As you see from mentioned table. when user1,user2, has instance quota >> of 2 and when user1 has created 2 instance, user2 unable to create >> instance. >> but user3 able to create only 1 more instance, user 4 has no quota >> applied so project quota 10 will be applied and he can create 10 instance. >> >> the quota is applied to each user but not tracked for each user, so >> this defeats the purpose of per user quota. >> >> Please help us with resolving this issue. Hi, I tried your scenario in devstack and found a bug in the [lack of] scoping for per-user quotas [1] and have a proposed a patch (still needs test coverage): https://review.opendev.org/748550 If you could please try out this patch and let me know whether you find any issues, it would be appreciated. With this patch, I got the following result with your same scenario (first user has instances quota of 2, second user has 2, third user has 3, last user has no per-user quota, project has 10): $ nova list --fields name,user_id,created --sort created_at:asc +--------------------------------------+-------+----------------------------------+----------------------+ | ID | Name | User Id | Created | +--------------------------------------+-------+----------------------------------+----------------------+ | 5a52f400-2bef-4f00-add4-df69b6ac195f | one | e630b64070f042e98381bb7f6be9919c | 2020-08-27T21:41:06Z | | 800c7673-0846-4c2e-a502-2d8db7ceab40 | two | e630b64070f042e98381bb7f6be9919c | 2020-08-27T21:42:07Z | | af9b35ce-6ba9-4657-aacf-aedb1915ce9a | three | b34b2b234e0545b9a54ce7f63d9b116e | 2020-08-27T23:14:36Z | | d83e9c56-ccc5-4d65-bb81-ac3be3a8f575 | four | b34b2b234e0545b9a54ce7f63d9b116e | 2020-08-27T23:16:24Z | | 56aa06d2-2d1f-49e4-a314-e2e06f68fef0 | five | 3278d32e38534016963e457f6c9d07d7 | 2020-08-27T23:16:59Z | | 38e84ebb-fb88-4e39-b5fe-32bbcdd5f062 | six | 3278d32e38534016963e457f6c9d07d7 | 2020-08-27T23:17:19Z | | 7376788f-c51a-4a6d-be91-c14bb71b3541 | seven | 3278d32e38534016963e457f6c9d07d7 | 2020-08-27T23:17:37Z | | f072d745-c37f-493d-8fa8-7dc83d520539 | eight | 06f1a9d74d214fa1b352d4a3f41e3421 | 2020-08-27T23:18:21Z | | 58688387-8f8c-4b60-acac-40d11a8ca5b9 | nine | 06f1a9d74d214fa1b352d4a3f41e3421 | 2020-08-27T23:18:37Z | | 1a25e99f-2bfe-42ea-b1b3-e88a78b11293 | ten | 06f1a9d74d214fa1b352d4a3f41e3421 | 2020-08-27T23:18:53Z | +--------------------------------------+-------+----------------------------------+----------------------+ And I was not able to create a fourth instance with the last user because that would exceed the total project quota of 10. Also, be careful with how you assign per-user quota in nova. The first time I tried your scenario, I did not make sure to use user_id UUID instead of name. The per-user quotas will not work properly if you do not specify the user_id as a UUID, example: $ nova quota-update --user 3278d32e38534016963e457f6c9d07d7 --instance 3 518c0eaec2754217bee6b67a1ec6f884 where the first UUID is the user_id and the second UUID is the project_id. > I had understood that per-user quota was deprecated now. > > Have you had a look at creating dedicated per-usret projects with > assigned quotas ? Tim is correct in that per-user quota is not encouraged because when we move to unified limits in nova [2], they will be removed [3]. If you are able to use a dedicated project per user instead of using per-user quotas, that is a better approach. Hope this helps, -melanie [1] https://bugs.launchpad.net/nova/+bug/1893284 [2] https://review.opendev.org/#/q/topic:bp/unified-limits-nova [3] https://docs.openstack.org/nova/latest/admin/quotas.html#view-and-update-quota-values-for-a-project-user From zbitter at redhat.com Fri Aug 28 01:39:53 2020 From: zbitter at redhat.com (Zane Bitter) Date: Thu, 27 Aug 2020 21:39:53 -0400 Subject: [Heat] Reg Creation of resource based on another resource attribute value In-Reply-To: References: Message-ID: On 27/08/20 7:09 am, NANTHINI A A wrote: > Hi Team , > >     I want to create the openstack subnet resource based on the > openstack network resource’s attribute STATUS value. > >    i.e Create neutron subnet only when the neutron network status is > ACTIVE . > >    I can see currently the support of get_Attr function is not there in > conditions section . Correct, and that's because they're evaluated at different times. When you create (or update) the stack Heat decides immediately which resources are enabled. But the attribute values are not known until after that resource is created. > Also the depends_on function accepts input as > resource ids only .I cant pass a condition there . I believe you can depend on a resource that is conditionally disabled without causing an error. >    Is there any other way to implement the same .Please suggest . There isn't. cheers, Zane. From thierry at openstack.org Fri Aug 28 10:01:59 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 28 Aug 2020 12:01:59 +0200 Subject: [ops] Restructuring OSOPS tools In-Reply-To: References: Message-ID: Sean McGinnis wrote: > [...] > Since these are now owned by an official SIG, we can move this content > back under the openstack/ namespace. That should help increase > visibility somewhat, and make things look a little more official. It > will also allow contributors to tooling to get recognition for > contributing to an import part of the OpenStack ecosystem. > > I do think it's can be a little more difficult to find things spread out > over several repos though. For simplicity with finding tooling, as well > as watching for reviews and helping with overall maintenance, I would > like to move all of these under a common openstack/osops. Under that > repo, we can then have a folder structure with tools/logging, > tools/monitoring, etc. Also the original setup[1] called for moving things from one repo to another as they get more mature, which loses history. So I agree a single repository is better. However, one benefit of the original setup was that it made it really low-friction to land half-baked code in the osops-tools-contrib repository. The idea was to encourage tools sharing, rather than judge quality or curate a set. I think it's critical for the success of OSops that operator code can be brought in with very low friction, and curation can happen later. If we opt for a theme-based directory structure, we could communicate that a given tool is in "unmaintained/use-at-your-own-risk" status using metadata. But thinking more about it, I would suggest we keep a low-friction "contrib/" directory in the repo, which more clearly communicates "use at your own risk" for anything within it. Then we could move tools under the "tools/" directory structure if a community forms within the SIG to support and improve a specific tool. That would IMHO allow both low-friction landing *and* curation to happen. > [...] > Please let me know if there are any objects to this plan. Otherwise, I > will start cleaning things up and getting it staged in a new repo to be > imported as an official repo owned by the SIG. I like it! [1] https://wiki.openstack.org/wiki/Osops -- Thierry Carrez (ttx) From skaplons at redhat.com Fri Aug 28 10:04:11 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 28 Aug 2020 12:04:11 +0200 Subject: [neutron][gate] verbose q-svc log files and e-r indexing In-Reply-To: <62e4fcd2-0f7a-a7d3-7692-3ad9a05c8399@gmail.com> References: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> <20200818150052.u4xkjsptejikwcny@skaplons-mac> <62e4fcd2-0f7a-a7d3-7692-3ad9a05c8399@gmail.com> Message-ID: <20200828100411.l7egqidkfzfi4xjt@skaplons-mac> Hi, On Tue, Aug 18, 2020 at 02:10:40PM -0700, melanie witt wrote: > On 8/18/20 08:00, Slawek Kaplonski wrote: > > Hi, > > > > I proposed patch [1] which seems that decreased size of the neutron-server log > > a bit - see [2] but it's still about 40M :/ > > > > [1] https://review.opendev.org/#/c/730879/ > > [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcdn.com/746714/1/check/tempest-full-py3/5c1399c/controller/logs/ > > Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!). > > With your patch applied, the most frequent potential log message I see now is like this: > > Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}} > > with the line count difference being with and without: > > $ wc -l "screen-q-svc.txt" > 102493 screen-q-svc.txt > > $ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l > 83261 > > so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output. I was looking at this again today but I'm really not sure if we should get rid of those messages from the log. For now I think that indexing of screen-q-svc.txt file is disabled so this size of the log shouldn't be big problem (I hope) and I would like to not remove any other debug messages from it if that will not be really necessary. > > But I'm not sure whether that's a critical debugging element or not. > > -melanie > -- Slawek Kaplonski Principal software engineer Red Hat From mdulko at redhat.com Fri Aug 28 11:49:32 2020 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 28 Aug 2020 13:49:32 +0200 Subject: [kuryr] vPTG October 2020 In-Reply-To: <54f84af6378e1507d1f04c0aab733922cdc2c8bd.camel@redhat.com> References: <54f84af6378e1507d1f04c0aab733922cdc2c8bd.camel@redhat.com> Message-ID: <68470a74141032e96e04e3b26fb2cab8eba80e61.camel@redhat.com> On Mon, 2020-08-17 at 09:46 +0200, Michał Dulko wrote: > Hello all, > > There's a vPTG October 2020 project signup process going on and I'd > like to ask if you want me to reserve an hour or two there for a sync > up on the priorities and plans of various parts of the team. I haven't heard much, so I've just reserved 2 one-hour slots that won't overlap with Octavia sessions: * 7-8 UTC on October 27th * 15-16 UTC on October 29th I think we'll treat those as pretty open discussion, but feel free to add stuff to the agenda etherpad, so that team members could prepare for the topics in advance. [1] https://etherpad.opendev.org/p/kuryr-virtual-W-ptg Thanks, Michał From adriant at catalystcloud.nz Fri Aug 28 12:36:31 2020 From: adriant at catalystcloud.nz (Adrian Turjak) Date: Sat, 29 Aug 2020 00:36:31 +1200 Subject: [tc][telemetry][gnocchi] The future of Gnocchi in OpenStack Message-ID: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> Hey OpenStackers, We're currently in the process of discussing what to do with OpenStack's reliance on Gnocchi, and at present it is looking like we are most likely to just fork it back under a new name (currently Farfalle to stick with the pasta theme). The discussion is mostly happening here: https://review.opendev.org/#/c/744592/ But for those running Gnocchi in prod, this is likely something you may want to know about and we'd like to hear from you. A bit of history: Gnocchi started off as a new backend for Ceilometer in OpenStack, and eventually become the defacto API for telemetry samples when that was removed from Ceilometer (as backed by MongoDB). Gnocchi was eventually spun off outside of OpenStack, but still essentially remained our API for telemetry despite not being an official part of OpenStack anymore. Since then the development around it seems to have stalled, with pull requests left unreviewed, CI broken, and even the domain for the docs lapsing once. They have essentially said the project is unmaintained themselves: https://github.com/gnocchixyz/gnocchi/issues/1049 Given that OpenStack telemetry relies on it, we needed to decide what to do. We tried talking to the devs which spun it off outside of OpenStack, but they seem disinclined to interact with the OpenStack community, or move the project back to our infra/governance despite OpenStack looking like the only consumers of Gnocchi as a project. We want to find a solution, and the feeling is that they don't. So we've opted to fork it back and now the discussion is how to approach that fork. The OpenStack community doesn't want to maintain a time series database, but our telemetry API is part of it. We are putting it under non-OpenStack namespace to start, but we need to decide what the long term place for it should be. Do we want to make it an official project again? Do we keep it just as an API and drop the time series DB part for another DB? Do we build a new API back into Ceilometer and switch to a different backend like InfluxDB? We don't know yet, and we want some input from people who use the service so we can hopefully work with OpenStack telemetry as a whole and figure out what the long term picture is. If Gnocchi matters to you at all, or you use it, we want to hear from you. Cheers, Adrian Turjak From mnaser at vexxhost.com Fri Aug 28 12:45:40 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 28 Aug 2020 08:45:40 -0400 Subject: [tc] office hours update Message-ID: Hi everyone, In order to be able to have more folks available during office hours, we have the following newly available times once this patch lands: https://review.opendev.org/746167 * 01:00 UTC on Tuesdays: http://www.timeanddate.com/worldclock/fixedtime.html?hour=01&min=00&sec=0 * 15:00 UTC on Wednesdays: http://www.timeanddate.com/worldclock/fixedtime.html?hour=15&min=00&sec=0 We look forward to seeing our community present at those office hours with anything they have. Thank you, Mohammed -- Mohammed Naser VEXXHOST, Inc. From mark at stackhpc.com Fri Aug 28 13:47:41 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 28 Aug 2020 14:47:41 +0100 Subject: [Kolla] re-create container In-Reply-To: References: Message-ID: On Thu, 27 Aug 2020 at 21:48, Tony Liu wrote: > > Hi, > > Is Kolla container created by playbook only or there is > something like docker-compose to re-create container in > case it's deleted after initial deployment? If you run any of the deploy, reconfigure, deploy-containers or upgrade commands, kolla ansible will ensure that necessary containers exist, even if they were removed. > > Thanks! > Tony > > From zbitter at redhat.com Fri Aug 28 14:48:53 2020 From: zbitter at redhat.com (Zane Bitter) Date: Fri, 28 Aug 2020 10:48:53 -0400 Subject: [tc][telemetry][gnocchi] The future of Gnocchi in OpenStack In-Reply-To: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> References: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> Message-ID: <5cc54d3f-ecf3-a769-9edb-187efc1c2d3f@redhat.com> On 28/08/20 8:36 am, Adrian Turjak wrote: > Hey OpenStackers, > > We're currently in the process of discussing what to do with OpenStack's > reliance on Gnocchi, and at present it is looking like we are most > likely to just fork it back under a new name (currently Farfalle to > stick with the pasta theme). > > The discussion is mostly happening here: > https://review.opendev.org/#/c/744592/ > > But for those running Gnocchi in prod, this is likely something you may > want to know about and we'd like to hear from you. > > A bit of history: Gnocchi started off as a new backend for Ceilometer in > OpenStack, and eventually become the defacto API for telemetry samples > when that was removed from Ceilometer (as backed by MongoDB). Gnocchi > was eventually spun off outside of OpenStack, but still essentially > remained our API for telemetry despite not being an official part of > OpenStack anymore. I think a large part of the issue here is that there are multiple reasons for wanting (small-t) telemetry from OpenStack, and historically because of reasons they have all been conflated into one Thing with the result that sometimes one use case wins. At least 3 that I can think of are: 1) Monitoring the OpenStack infrastructure by the operator, including feeding into business processes like reporting, capacity planning &c. 2) Billing 3) Monitoring user resources by the user/application, either directly or via other OpenStack services like Heat or Senlin. For the first, you just want to be able to dump data into a TSDB of the operator's choice. Since all of the reporting requirements are business-specific anyway, it's up to the operator to decide how they want to store the data and how they want to interact with it. It appears that this may have been the theory behind the Gnocchi split. On the other hand, for the third one you really need something that should be an official OpenStack API with all of the attendant stability guarantees, because it is part of OpenStack's user interface. The second lands somewhere in between; AIUI CloudKitty is written to support multiple back-ends, with OpenStack Telemetry being the primary one. So it needs a fairly stable API because it's consumed by other OpenStack projects, but it's ultimately operator-facing. As I have argued before, when we are thinking about road maps we need to think of these as different use cases, and they're different enough that they are probably best served by least two separate tools. Mohammed has made a compelling argument in the past that Prometheus is more or less the industry standard for the first use case, and we should just export metrics to that directly in the OpenStack services, rather than going through the Ceilometer collector. I don't know what should be done about the third, but I do know that currently Telemetry is breaking Heat's gate and people are seriously discussing disabling the Telemetry-related tests, which I assume would mean deprecating the resources. Monasca offers an alternative, but isn't preferred for some distributors and operators because it brings the whole Java ecosystem along for the ride (managing the Python one is already hard enough). cheers, Zane. From mahdi.abbasi.2013 at gmail.com Fri Aug 28 08:03:07 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Fri, 28 Aug 2020 12:33:07 +0430 Subject: Horizon installation problem Message-ID: Hi openstack development team, When i want install horizon with pip3 i recieve and error: Could not satisfy constraints for horizon: installation from path or url cannot be constrained to a version. Please help me Best regards Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From cohuck at redhat.com Fri Aug 28 13:47:41 2020 From: cohuck at redhat.com (Cornelia Huck) Date: Fri, 28 Aug 2020 15:47:41 +0200 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200826064117.GA22243@joy-OptiPlex-7040> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> <20200825163925.1c19b0f0.cohuck@redhat.com> <20200826064117.GA22243@joy-OptiPlex-7040> Message-ID: <20200828154741.30cfc1a3.cohuck@redhat.com> On Wed, 26 Aug 2020 14:41:17 +0800 Yan Zhao wrote: > previously, we want to regard the two mdevs created with dsa-1dwq x 30 and > dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources. > > But, as it's a burden to upper layer, we agree that if this condition > happens, we still treat the two as incompatible. > > To fix it, either the driver should expose dsa-1dwq only, or the target > dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30. AFAIU, these are mdev types, aren't they? So, basically, any management software needs to take care to use the matching mdev type on the target system for device creation? From smooney at redhat.com Fri Aug 28 14:04:12 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 28 Aug 2020 15:04:12 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200828154741.30cfc1a3.cohuck@redhat.com> References: <20200814051601.GD15344@joy-OptiPlex-7040> <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> <20200825163925.1c19b0f0.cohuck@redhat.com> <20200826064117.GA22243@joy-OptiPlex-7040> <20200828154741.30cfc1a3.cohuck@redhat.com> Message-ID: <8f5345be73ebf4f8f7f51d6cdc9c2a0d8e0aa45e.camel@redhat.com> On Fri, 2020-08-28 at 15:47 +0200, Cornelia Huck wrote: > On Wed, 26 Aug 2020 14:41:17 +0800 > Yan Zhao wrote: > > > previously, we want to regard the two mdevs created with dsa-1dwq x 30 and > > dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources. > > > > But, as it's a burden to upper layer, we agree that if this condition > > happens, we still treat the two as incompatible. > > > > To fix it, either the driver should expose dsa-1dwq only, or the target > > dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30. > > AFAIU, these are mdev types, aren't they? So, basically, any management > software needs to take care to use the matching mdev type on the target > system for device creation? or just do the simple thing of use the same mdev type on the source and dest. matching mdevtypes is not nessiarly trivial. we could do that but we woudl have to do that in python rather then sql so it would be slower to do at least today. we dont currently have the ablity to say the resouce provider must have 1 of these set of traits. just that we must have a specific trait. this is a feature we have disucssed a couple of times and delayed untill we really really need it but its not out of the question that we could add it for this usecase. i suspect however we would do exact match first and explore this later after the inital mdev migration works. by the way i was looking at some vdpa reslated matiail today and noticed vdpa devices are nolonger usign mdevs and and now use a vhost chardev so i guess we will need a completely seperate mechanioum for vdpa vs mdev migration as a result. that is rather unfortunet but i guess that is life. > From fungi at yuggoth.org Fri Aug 28 16:11:25 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 28 Aug 2020 16:11:25 +0000 Subject: Horizon installation problem In-Reply-To: References: Message-ID: <20200828161125.vbqwob5i6ocyreor@yuggoth.org> On 2020-08-28 12:33:07 +0430 (+0430), mahdi abbasi wrote: [...] > When i want install horizon with pip3 i recieve and error: > > Could not satisfy constraints for horizon: installation from path or url > cannot be constrained to a version. [...] This sounds like you're passing a -c option to pip telling it to apply a constraints file, but you're attempting to install Horizon from source instead of from a released package so it can't be matched against the constraints list. The easy workaround is to delete the Horizon entry from the constraints list you're using, or consume a release of Horizon from PyPI instead of using a source checkout. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Fri Aug 28 16:35:55 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 28 Aug 2020 11:35:55 -0500 Subject: [all][tc][policy] Progress report of consistent and secure default policies pop-up team Message-ID: <17435ecf42a.129e8542b80609.3552606214442342355@ghanshyammann.com> Hello Everyone, This is a regular update on progress in 'Consistent and Secure Default Policies Popup Team'. We will try to make it a monthly report form now onwards. Progress so far: ============ * Popup team meet twice in a month and discuss and work on progress and pre-work to do. - https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team#Meeting * Pre-work to provide a smooth migration path to the new policy ** Migrate Default Policy Format from JSON to YAML - This involves oslo side + each project side works. - oslo side work to provide tool and utils method are merged (one patch is in gate). - The new tool 'oslopolicy-convert-json-to-yaml' is available now to convert your existing JSON formatted policy file to YAML formatted in a backward-compatible way. - I have started to do it in Nova (need to update the patch though) to give example work for other projects: https://review.opendev.org/#/c/748059/ - all work is tracked here: https://review.opendev.org/#/q/topic:bp/policy-json-to-yaml+(status:open+OR+status:merged) ** Improving documentation about target resources (oslo.policy) - https://bugs.launchpad.net/oslo.policy/+bug/1886857 - raildo pushed the patch which is under review: https://review.opendev.org/#/c/743318/ * Team Progress: (list of a team interested or have volunteer to work) ** Keystone (COMPLETED; use as a reference) ** Nova (COMPLETED; use as a reference) - All APIs except deprecated APIs were done in the Ussuri cycle and deprecated APIs also done now. ** Cyborg (in-progress) - Spec is merged, code under review. ** Barbican (not started) ** Neutron (not started) ** Cinder (not started) ** Manila (not started) Why This Is Important ================= (I have copied it from Colleen email which is nicely written) Separating system, domain, and project-scope APIs and providing meaningful default roles is critical to facilitating secure cloud deployments and to fulfilling OpenStack's vision as a fully self-service infrastructure provider[1]. Until all projects have completed this policy migration, the "reader" role that exists in keystone is dangerously misleading, and the `[oslo_policy]/enforce_scope` option has limited usefulness as long as projects lack uniformity in how an administrator can use scoped APIs. How You Can Help ================ Contributor: - You can help by starting the work in your (or any other you would like to help) project and attend popup team meeting in case of any question, review request etc. Cloud operator: - Please help review the proposed policy rule changes to sanity-check the new scope and role defaults. - Migrate your JSON formatted policy file to YAML JSON formatted file can be problematic in the various way as described here[2]. You can use 'oslopolicy-convert-json-to-yaml' tool [3] to convert your existing JSON formatted policy file to YAML formatted in a backward-compatible way. [1] https://governance.openstack.org/tc/reference/technical-vision.html#self-service [2] https://specs.openstack.org/openstack/oslo-specs/specs/victoria/policy-json-to-yaml.html#problem-description [3] https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html -gmann & raildo From cboylan at sapwetik.org Fri Aug 28 16:54:11 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 28 Aug 2020 09:54:11 -0700 Subject: [neutron][gate] verbose q-svc log files and e-r indexing In-Reply-To: <20200828100411.l7egqidkfzfi4xjt@skaplons-mac> References: <20200818103323.wq5upyjn4nzsqhx7@skaplons-mac> <20200818150052.u4xkjsptejikwcny@skaplons-mac> <62e4fcd2-0f7a-a7d3-7692-3ad9a05c8399@gmail.com> <20200828100411.l7egqidkfzfi4xjt@skaplons-mac> Message-ID: <22d34440-fc65-4f4b-a5e7-c5725283aa58@www.fastmail.com> On Fri, Aug 28, 2020, at 3:04 AM, Slawek Kaplonski wrote: > Hi, > > On Tue, Aug 18, 2020 at 02:10:40PM -0700, melanie witt wrote: > > On 8/18/20 08:00, Slawek Kaplonski wrote: > > > Hi, > > > > > > I proposed patch [1] which seems that decreased size of the neutron-server log > > > a bit - see [2] but it's still about 40M :/ > > > > > > [1] https://review.opendev.org/#/c/730879/ > > > [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcdn.com/746714/1/check/tempest-full-py3/5c1399c/controller/logs/ > > > > Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!). > > > > With your patch applied, the most frequent potential log message I see now is like this: > > > > Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}} > > > > with the line count difference being with and without: > > > > $ wc -l "screen-q-svc.txt" > > 102493 screen-q-svc.txt > > > > $ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l > > 83261 > > > > so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output. > > I was looking at this again today but I'm really not sure if we should get rid > of those messages from the log. > For now I think that indexing of screen-q-svc.txt file is disabled so this size > of the log shouldn't be big problem (I hope) and I would like to not remove any > other debug messages from it if that will not be really necessary. Maybe as an option we split this into two log files. One that is INFO and above and the other that includes everything with DEBUG? Then we can index the INFO and above contents only. One thing to keep in mind here is that this system tends to act like a canary for when our logs would create problems for people in production. The q-svc logs here are significantly more chatty than the other services. Not necessarily a problem, but don't be surprised if people notice after they upgrade and start complaining. > > > > > But I'm not sure whether that's a critical debugging element or not. > > > > -melanie > > > > -- > Slawek Kaplonski > Principal software engineer > Red Hat > > > From ltoscano at redhat.com Fri Aug 28 17:33:15 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Fri, 28 Aug 2020 13:33:15 -0400 (EDT) Subject: [all][goals] Switch legacy Zuul jobs to native - update #3 In-Reply-To: <4658601.CvnuH1ECHv@whitebase.usersys.redhat.com> Message-ID: <2019711119.48014000.1598635995968.JavaMail.zimbra@redhat.com> Hi, it's time for another status report on this goal. A lot of reviews have been merged in the past 10 days, and several projects are not on the list anymore. This is a very good news, but we still have some work to complete. Please keep pushing! Status ====== The number of the project with legacy jobs is now limited, so I'm going to explain the status in more details for each of them. The links to the patches can be found in the etherpad [3] (see below for the links). cinder ------ There is just one test left, and it is definitely tricky, because it implements a cycle of "change tempest configuration"/"run tempest"/"repeat", which is not the usual pattern. In the worst case I will "port" by adding a simple bash wrapper, but I'd like to have a clean ansible solution. designate --------- A patch for the only legacy job is under review after some forth and back, but there are some open questions. heat ------ Only one legacy job left, the heat cores are aware of it. infra ----- Only one devstack-gate job in the os-loganalyze repository, which should be probably retired. There are 2 other legacy jobs, but not devstack-gate, so less urgent. ironic ------ There has been an open review with the full port of the last legacy job, but it is failing. As it is has been failing even before the porting, the patch could be probably merged as it is. karbor ------ A patch for the only legacy job is under review, but it still has some issues. manila ------ There is only one legacy-base job (not devstack-gate), so less urgent, but there is a patch for it. monasca ------- One job in the monasca-transform repository, which is most likely due for a retirement. There are 3 legacy (non devstack-gate) jobs in other repositories. murano ------ There are two legacy jobs left. I'm not sure whether murano-apps-refstackclient-unittest is still needed. murano-dashboard-sanity-check is a bit tricky, the tests still use nose and the corresponding code in horizon has seen several changes. neutron ------- There are three types of legacy jobs: * all jobs in networking-midonet, whose retirement is under discussion, but the final decision is not clear, so a porting may be needed anyway: * two grenade jobs are being worked on; * the remaining legacy job could be maybe dropped. nova ---- The team is trying to port the two legacy job left with some refactoring, but it may require some effort yet. oslo ---- Only one legacy job left, but it is part of the soon-to-be-retired devstack-plugin-zmq repository. senlin ------ A patch for the only legacy job has been proposed and it is working, needs reviews. trove ----- The trove-grenade job should be ported, but on the other hand, trove has no grenade plugin. At this point it is unlikely to be implemented before Victoria, so maybe the job can be dropped for now. zaqar ----- A few patches have been proposed and working. One of them is failing (python-zaqarclient) but it does a bit more than a simple porting, so it may be simply changed to do exactly what the old job was doing (input needed). References ========== [1] the goal: https://governance.openstack.org/tc/goals/selected/victoria/native-zuulv3-jobs.html [2] the up-to-date Zuul v3 porting guide: https://docs.openstack.org/project-team-guide/zuulv3.html [3] the etherpad which tracks the current status: https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration [4] the previous reports: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016058.html http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016561.html Ciao -- Luigi From romanko at selectel.com Fri Aug 28 17:52:15 2020 From: romanko at selectel.com (=?UTF-8?B?0JjQstCw0L0g0KDQvtC80LDQvdGM0LrQvg==?=) Date: Fri, 28 Aug 2020 20:52:15 +0300 Subject: [tc][telemetry][gnocchi] The future of Gnocchi in OpenStack In-Reply-To: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> References: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> Message-ID: пт, 28 авг. 2020 г. в 15:40, Adrian Turjak : > But for those running Gnocchi in prod, this is likely something you may > want to know about and we'd like to hear from you. > Hello, everyone! Here at Selectel we use Gnocchi as a backend for Ceilometer – we gather different metrics from virtual machines and provide our customers with graphs in a control panel. In this scenario we rely on Gnocchi's Keystone auth support and nearly standard mappings for instances, volumes, ports, etc provided out of the box. We also use Gnocchi as a secondary target for our home-grown billing system. Billing measures are gathered from different OpenStack and custom APIs, go through the charging engine and then being POSTed to Gnocchi API in batches. Here again we need the possibility to fetch measures with project- and domain- scoped tokens on the customer side in the control panel to be able to separate scopes for resellers (domain owners) and their clients (project owners). The third way to consume Gnocchi API is through OpenStack Watcher in it's strategy for balancing load in our regions. Here we use hosts metrics as well as virtual machines metrics. What do we like in Gnocchi: - API is clean and easy to use, object model is universal and makes us able to utilize it in different scenarios; - Fast enough for our use cases; - Can store metrics for a long period of time with a ceph backend with no performance penalty – useful in billing case. What we do not like: - server-side aggregations do not work as one might think they should work – API and CLI are very hard to use, we stopped trying to use them; - very CPU and disk IO intensive, platforms are hot like hell 24/7 processing not more then 1k metrics per second; - sometimes deadlocks happen in Redis incoming metrics storage preventing measures from certain metrics from being processed. What are our plans for the nearest future: - try to switch Watcher to Grafana backend to be able to use the same Prometheus metrics we rely on for alerting and capacity planning; - continue using Gnocchi only for VMs mertics, switching billing system for something more reliable in terms of missed points on graphs. Speaking about VMs metrics, it would probably be great to be able to continue using Gnocchi API for customer-facing features as it works well with OpenStack object model, authentication and everything. But Gnocchi's TSDB is not the best on the market. By switching it to Victoria Metrics, providing Prometheus API and working amazingly with Grafana, we would be able to gather and store metrics with node/libvirt exporters and Prometheus doing remote writes to Victoria, and consume them via Grafana/AlertManager or Gnocchi API depending on a scenario. -- Ivan Romanko Selectel -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Aug 28 18:56:46 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 28 Aug 2020 13:56:46 -0500 Subject: [release] Release countdown for week R-6 Aug 31 - Sept 4 Message-ID: <20200828185646.GA128227@sm-workstation> Development Focus ----------------- Work on libraries should be wrapping up, in preparation for the various library-related deadlines coming up. Now is a good time to make decisions on deferring feature work to the next development cycle in order to be able to focus on finishing already-started feature work. General Information ------------------- We are now getting close to the end of the cycle, and will be gradually freezing feature work on the various deliverables that make up the OpenStack release. This coming week is the deadline for general libraries (except client libraries): their last feature release needs to happen before "Non-client library freeze" on September 3. Only bugfix releases will be allowed beyond this point. When requesting those library releases, you can also include the stable/victoria branching request with the review (as an example, see the "branches" section here: https://opendev.org/openstack/releases/src/branch/master/deliverables/pike/os-brick.yaml#n2 In the next weeks we will have deadlines for: * Client libraries (think python-*client libraries), which need to have their last feature release before "Client library freeze" (Sept 10) * Deliverables following a cycle-with-rc model (that would be most services), which observe a Feature freeze on that same date, Sept 10. Any feature addition beyond that date should be discussed on the mailing-list and get PTL approval. As we are getting to the point of creating stable/victoria branches, this would be a good point for teams to review membership in their victoria-stable-maint groups. Once the stable/victoria branches are cut for a repo, the ability to approve any necessary backports into those branches for Victoria will be limited to the members of that stable team. If there are any questions about stable policy or stable team membership, please reach out in the #openstack-stable channel. Upcoming Deadlines & Dates -------------------------- Non-client library freeze: September 3 (R-6 week) Client library freeze: September 10 (R-5 week) Victoria-3 milestone: September 10 (R-5 week) Cycle Highlights Due: September 10 (R-5 week) Victoria release: October 14 From zaitcev at redhat.com Fri Aug 28 19:48:44 2020 From: zaitcev at redhat.com (Pete Zaitcev) Date: Fri, 28 Aug 2020 14:48:44 -0500 Subject: [tripleo, ironic] Error: Could not retrieve ... pxelinux.0 Message-ID: <20200828144844.7787707d@suzdal.zaitcev.lan> Hello: I wanted to give the TripleO a try, so started follow our installation guide for Ussuri, and eventually made it to "openstack undercloud install". It fails with something like this: Aug 28 10:10:53 undercloud puppet-user[48657]: Error: /Stage[main]/Ironic::Pxe/File[/var/lib/ironic/tftpboot/ipxe.efi]: Could not evaluate: Could not retrieve information from environment production source(s) file:/usr/share/ipxe/ipxe-x86_64.efi Aug 27 20:05:42 undercloud puppet-user[37048]: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[pxelinux.0]/File[/var/lib/ironic/tftpboot/pxelinux.0]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/pxelinux.0 Does anyone have an idea what it wants? I added a couple of packages on the host system that provided the files mentioned in the message, but it made no difference. Ussuri is conteinerized anyway. Since I'm very new to this, I have no clue where to look at all. The nearest task is a wrapper of some kind, so the install-undercloud.log looks like this: 2020-08-28 14:11:31.397 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] TASK [Run container-puppet tasks (generate config) during step 1 with paunch] *** 2020-08-28 14:11:31.397 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] Friday 28 August 2020 14:11:31 -0400 (0:00:00.302) 0:06:28.734 ********* 2020-08-28 14:11:32.223 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] changed: [undercloud] 2020-08-28 14:11:32.325 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] 2020-08-28 14:11:32.326 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] TASK [Wait for container-puppet tasks (generate config) to finish] ************* 2020-08-28 14:11:32.326 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] Friday 28 August 2020 14:11:32 -0400 (0:00:00.928) 0:06:29.663 ********* 2020-08-28 14:11:32.948 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] WAITING FOR COMPLETION: Wait for container-puppet tasks (generate config) to finish (1200 retries left). . . . If anyone could tell roughly what is supposed to be going on here, it would be great. I may be able figure out the rest. Greetings, -- Pete From aschultz at redhat.com Fri Aug 28 20:00:11 2020 From: aschultz at redhat.com (Alex Schultz) Date: Fri, 28 Aug 2020 14:00:11 -0600 Subject: [tripleo, ironic] Error: Could not retrieve ... pxelinux.0 In-Reply-To: <20200828144844.7787707d@suzdal.zaitcev.lan> References: <20200828144844.7787707d@suzdal.zaitcev.lan> Message-ID: I've seen this in the past if there is a mismatch between the host OS and the Containers. Centos7 host with centos8 containers or vice versa. Ussuri should be CentOS8 host OS and make sure you're pulling the correct containers. The Ironic containers have some pathing mismatches when the configuration gets generated around this. It used to be compatible but we broke it at some point when switching some of the tftp location bits. Thanks, -Alex On Fri, Aug 28, 2020 at 1:55 PM Pete Zaitcev wrote: > > Hello: > > I wanted to give the TripleO a try, so started follow our > installation guide for Ussuri, and eventually made it to > "openstack undercloud install". It fails with something like this: > > Aug 28 10:10:53 undercloud puppet-user[48657]: Error: /Stage[main]/Ironic::Pxe/File[/var/lib/ironic/tftpboot/ipxe.efi]: Could not evaluate: Could not retrieve information from environment production source(s) file:/usr/share/ipxe/ipxe-x86_64.efi > Aug 27 20:05:42 undercloud puppet-user[37048]: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[pxelinux.0]/File[/var/lib/ironic/tftpboot/pxelinux.0]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/pxelinux.0 > > Does anyone have an idea what it wants? > > I added a couple of packages on the host system that provided > the files mentioned in the message, but it made no difference. > Ussuri is conteinerized anyway. > > Since I'm very new to this, I have no clue where to look at all. > The nearest task is a wrapper of some kind, so the install-undercloud.log > looks like this: > > 2020-08-28 14:11:31.397 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] TASK [Run container-puppet tasks (generate config) during step 1 with paunch] *** > 2020-08-28 14:11:31.397 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] Friday 28 August 2020 14:11:31 -0400 (0:00:00.302) 0:06:28.734 ********* > 2020-08-28 14:11:32.223 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] changed: [undercloud] > 2020-08-28 14:11:32.325 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] > 2020-08-28 14:11:32.326 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] TASK [Wait for container-puppet tasks (generate config) to finish] ************* > 2020-08-28 14:11:32.326 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] Friday 28 August 2020 14:11:32 -0400 (0:00:00.928) 0:06:29.663 ********* > 2020-08-28 14:11:32.948 60599 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] WAITING FOR COMPLETION: Wait for container-puppet tasks (generate config) to finish (1200 retries left). > . . . > > If anyone could tell roughly what is supposed to be going on here, > it would be great. I may be able figure out the rest. > > Greetings, > -- Pete > > From adriant at catalystcloud.nz Fri Aug 28 22:21:48 2020 From: adriant at catalystcloud.nz (Adrian Turjak) Date: Sat, 29 Aug 2020 10:21:48 +1200 Subject: [tc][telemetry][gnocchi] The future of Gnocchi in OpenStack In-Reply-To: <5cc54d3f-ecf3-a769-9edb-187efc1c2d3f@redhat.com> References: <0a22dd8a-2b54-cd22-1734-619d28d6efc8@catalystcloud.nz> <5cc54d3f-ecf3-a769-9edb-187efc1c2d3f@redhat.com> Message-ID: <89261d64-e88e-ff2e-0a28-894c509d50ab@catalystcloud.nz> On 29/08/20 2:48 am, Zane Bitter wrote: > I think a large part of the issue here is that there are multiple > reasons for wanting (small-t) telemetry from OpenStack, and > historically because of reasons they have all been conflated into one > Thing with the result that sometimes one use case wins. At least 3 > that I can think of are: > > 1) Monitoring the OpenStack infrastructure by the operator, including > feeding into business processes like reporting, capacity planning &c. > > 2) Billing > > 3) Monitoring user resources by the user/application, either directly > or via other OpenStack services like Heat or Senlin. > > > For the first, you just want to be able to dump data into a TSDB of > the operator's choice. Since all of the reporting requirements are > business-specific anyway, it's up to the operator to decide how they > want to store the data and how they want to interact with it. It > appears that this may have been the theory behind the Gnocchi split. > > On the other hand, for the third one you really need something that > should be an official OpenStack API with all of the attendant > stability guarantees, because it is part of OpenStack's user interface. > > The second lands somewhere in between; AIUI CloudKitty is written to > support multiple back-ends, with OpenStack Telemetry being the primary > one. So it needs a fairly stable API because it's consumed by other > OpenStack projects, but it's ultimately operator-facing. > > > As I have argued before, when we are thinking about road maps we need > to think of these as different use cases, and they're different enough > that they are probably best served by least two separate tools. > > Mohammed has made a compelling argument in the past that Prometheus is > more or less the industry standard for the first use case, and we > should just export metrics to that directly in the OpenStack services, > rather than going through the Ceilometer collector. > > I don't know what should be done about the third, but I do know that > currently Telemetry is breaking Heat's gate and people are seriously > discussing disabling the Telemetry-related tests, which I assume would > mean deprecating the resources. Monasca offers an alternative, but > isn't preferred for some distributors and operators because it brings > the whole Java ecosystem along for the ride (managing the Python one > is already hard enough). > > cheers, > Zane. > You are totally right about the three use cases, and we need to address this as we move forward with Not-Gnocchi and the rest of Telemetry. Internally we've never used OS-Telemetry for case 1, but we do use it for cases 2 and 3. I do think having a stable API for OpenStack for those last two cases is worth it, and I don't think merging those together is too hard. The way Cloudkitty (and our thing Distil) process the data for billing means we aren't needing to store months of data in the telemetry system because we ingest and aggregate into our own systems. The third use case doesn't need much long term data in a high level of granularity, but does (like billing) need high accuracy closer to 'now'. So again I think those line up well to fit into a single system, with maybe different granularity on specific metrics. We should try and fix the telemetry heat tests ideally, because there are people using Aodh and auto-scaling. As for case 1, I agree that trying to encourage Prometheus support in OpenStack is a good aim. Sadly though supporting it directly in each service likely won't be too easy, but Ceilometer already supports pushing to it, so that's good enough for now: https://github.com/openstack/ceilometer/blob/master/ceilometer/publisher/prometheus.py We do need a more coherent future plan for Telemetry in OpenStack, but the starting point is stabilizing and consolidating before we try and steer in a new direction. From mahdi.abbasi.2013 at gmail.com Sat Aug 29 14:33:59 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Sat, 29 Aug 2020 19:03:59 +0430 Subject: Openstack zun ui Message-ID: Hi, I want install zun ui, first i installed horizon with package and then install zun ui with pip and then python horizon_path/manage.py collectstatic But i receive error in httpd finaly. Error: ImportError: No module named zun_ui Please help me Best Regards Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Sat Aug 29 20:01:31 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sat, 29 Aug 2020 16:01:31 -0400 Subject: Openstack zun ui In-Reply-To: References: Message-ID: Hi Mahdi, I need more information to help the troubleshooting: * Which version of Horizon you installed (master? stable/ussuri, etc.) * Which version of zun-ui you installed (master? stable/ussuri, etc.) * Which operating system you were using? * What are the outputs of the following commands? $ python --version $ pip freeze $ pip3 freeze On Sat, Aug 29, 2020 at 1:04 PM mahdi abbasi wrote: > Hi, > > I want install zun ui, first i installed horizon with package and then > install zun ui with pip and then python horizon_path/manage.py collectstatic > But i receive error in httpd finaly. > > Error: > ImportError: No module named zun_ui > > Please help me > > Best Regards > Mahdi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Sun Aug 30 10:03:56 2020 From: mkopec at redhat.com (Martin Kopec) Date: Sun, 30 Aug 2020 12:03:56 +0200 Subject: [all][infra] READMEs of zuul roles not rendered properly - missing content In-Reply-To: <14978702-3919-943f-2750-3ecae1201a68@gmail.com> References: <20200824143618.7xdecj67m5jzwpkz@yuggoth.org> <14978702-3919-943f-2750-3ecae1201a68@gmail.com> Message-ID: Let's summarize the facts mentioned in the thread to continue discussion: **1.** ".. zuul:rolevar::" structure is used to generate a documentation so using a different syntax is not an option **2.** turn rendering of rst files off: **2a:** in order to make pointing to the source code easier - that's an interesting topic, probably it would be better to discuss this separately - turning rendering completely off would be a step back from the visual perspective, however, I can imagine that referring to the specific lines of rst files can be needed in some cases. The automatic rst rendering is really useful, f.e when I open a repo I really appreciate that README.rst is rendered under the list of files (f.e. this view [5]). On the other side, when I open specifically a rst file (f.e. this view [6]) I can totally imagine that this file would be opened in a mode when a user can refer to any line like it is with any other source file. If not that, what about adding a new button besides Raw, Permalink, Blame, History called f.e. Source? This functionality would give a certain advantage to opendev.org over github.com **2b:** in order to avoid omitting content - as also mentioned in the comments in [7] turning rendering completely off would be a step back at least from the visual perspective, therefore I'd rather move to the direction where we improved rendering capabilities - either switching to a different rendering tool or just implementing some kind of try except block as Jeremy suggested in [7] - if rendering of a certain block of code throws an error, that part of the code would be rendered as is. [5] https://opendev.org/openstack/tempest [6] https://opendev.org/openstack/tempest/src/branch/master/README.rst [7] https://review.opendev.org/#/c/747796/ On Tue, 25 Aug 2020 at 16:22, Brian Rosmaita wrote: > On 8/24/20 11:05 AM, Clark Boylan wrote: > > On Mon, Aug 24, 2020, at 7:36 AM, Jeremy Stanley wrote: > >> On 2020-08-24 16:12:17 +0200 (+0200), Martin Kopec wrote: > >>> I've noticed that READMEs of zuul roles within openstack projects > >>> are not rendered properly on opendev.org - ".. zuul:rolevar::" > >>> syntax seems to be the problem. Although it's rendered well on > >>> github.com, see f.e. [1] [2]. > > [snip] > > >> To be entirely honest, I wish Gitea didn't automatically attempt to > >> render RST files, that makes it harder to actually refer to the > >> source code for them, and it's a source code browser not a CMS for > >> publishing documentation, but apparently this is a feature many > >> other users do like for some reason. > > > > We can change this behavior by removing the external renderer (though I > expect we're in the minority of preferring ability to link to the source > here). > > This may be a bigger minority that you think ... I put up a patch to > change the default behavior to not render RST, so anyone with a strong > opinion, please comment on the patch: > https://review.opendev.org/#/c/747796/ > > > > > [3] > https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea/templates/app.ini.j2#L88-L95 > > [4] > https://opendev.org/opendev/system-config/src/branch/master/docker/gitea/Dockerfile#L92-L94 > > > >> -- > >> Jeremy Stanley > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahdi.abbasi.2013 at gmail.com Sun Aug 30 05:31:19 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Sun, 30 Aug 2020 10:01:19 +0430 Subject: Openstack zun ui In-Reply-To: References: Message-ID: - horizon and zun ui are version stable/train - my os is centos 7 Outputs of commands: https://paste.ubuntu.com/p/J8szczHrXF/ On Sun, 30 Aug 2020, 00:31 Hongbin Lu, wrote: > Hi Mahdi, > > I need more information to help the troubleshooting: > > * Which version of Horizon you installed (master? stable/ussuri, etc.) > * Which version of zun-ui you installed (master? stable/ussuri, etc.) > * Which operating system you were using? > * What are the outputs of the following commands? > > $ python --version > $ pip freeze > $ pip3 freeze > > On Sat, Aug 29, 2020 at 1:04 PM mahdi abbasi > wrote: > >> Hi, >> >> I want install zun ui, first i installed horizon with package and then >> install zun ui with pip and then python horizon_path/manage.py collectstatic >> But i receive error in httpd finaly. >> >> Error: >> ImportError: No module named zun_ui >> >> Please help me >> >> Best Regards >> Mahdi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Sun Aug 30 15:41:26 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sun, 30 Aug 2020 11:41:26 -0400 Subject: [octavia] usage of SELECT .. FOR UPDATE Message-ID: Hi everyone, We're being particularly hit hard across different deployments where Octavia has several SELECT .. FOR UPDATE queries which are causing load balancers to fail to provision properly. - spare_pools: This usually hits on rolling restarts of o-housekeeping as they all seem to try to capture a lock -- https://github.com/openstack/octavia/blob/73fbc05386b512aa1dd86a0ed6e8455cc6b8dc7f/octavia/controller/housekeeping/house_keeping.py#L54 - quota: This hits when provisioning a lot of load balancers in parallel. For example in cases when using Heat -- https://github.com/openstack/octavia/blob/bf3d5372b9fc670ecd08339fa989c9b738ad8d69/octavia/db/repositories.py#L565-L566 These hurt quite a lot in a busy deployment and result in a poor user experience unfortunately. We're trying to off-load Octavia to it's own database server but that is more of a "throw power at the problem" solution. I can imagine that we can probably likely look into a better/cleaner alternative that avoids this entirely? I'm happy to try and push for some of this work on our side. Thanks, Mohammed -- Mohammed Naser VEXXHOST, Inc. From hongbin034 at gmail.com Sun Aug 30 18:31:27 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sun, 30 Aug 2020 14:31:27 -0400 Subject: Openstack zun ui In-Reply-To: References: Message-ID: On Sun, Aug 30, 2020 at 1:31 AM mahdi abbasi wrote: > - horizon and zun ui are version stable/train > - my os is centos 7 > Outputs of commands: > > https://paste.ubuntu.com/p/J8szczHrXF/ > I saw you were using horizon==18.4.1, which is basically the latest version of Horizon. The zun-ui version is zun-ui==4.0.x which is stable/train. This version of zun-ui doesn't match the version of horizon. If you want to use horizon 18.4.1, suggest to install zun-ui from master branch. Alternatively, you can re-install horizon 16.x.x to match zun-ui 4.0.x. > > > > On Sun, 30 Aug 2020, 00:31 Hongbin Lu, wrote: > >> Hi Mahdi, >> >> I need more information to help the troubleshooting: >> >> * Which version of Horizon you installed (master? stable/ussuri, etc.) >> * Which version of zun-ui you installed (master? stable/ussuri, etc.) >> * Which operating system you were using? >> * What are the outputs of the following commands? >> >> $ python --version >> $ pip freeze >> $ pip3 freeze >> >> On Sat, Aug 29, 2020 at 1:04 PM mahdi abbasi >> wrote: >> >>> Hi, >>> >>> I want install zun ui, first i installed horizon with package and then >>> install zun ui with pip and then python horizon_path/manage.py collectstatic >>> But i receive error in httpd finaly. >>> >>> Error: >>> ImportError: No module named zun_ui >>> >>> Please help me >>> >>> Best Regards >>> Mahdi >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahdi.abbasi.2013 at gmail.com Sun Aug 30 19:50:49 2020 From: mahdi.abbasi.2013 at gmail.com (mahdi abbasi) Date: Mon, 31 Aug 2020 00:20:49 +0430 Subject: Openstack zun ui In-Reply-To: References: Message-ID: Thanks a lot On Sun, 30 Aug 2020, 23:01 Hongbin Lu, wrote: > > > On Sun, Aug 30, 2020 at 1:31 AM mahdi abbasi > wrote: > >> - horizon and zun ui are version stable/train >> - my os is centos 7 >> Outputs of commands: >> >> https://paste.ubuntu.com/p/J8szczHrXF/ >> > > I saw you were using horizon==18.4.1, which is basically the latest > version of Horizon. The zun-ui version is zun-ui==4.0.x which is > stable/train. This version of zun-ui doesn't match the version of horizon. > If you want to use horizon 18.4.1, suggest to install zun-ui from master > branch. Alternatively, you can re-install horizon 16.x.x to match zun-ui > 4.0.x. > > >> >> >> >> On Sun, 30 Aug 2020, 00:31 Hongbin Lu, wrote: >> >>> Hi Mahdi, >>> >>> I need more information to help the troubleshooting: >>> >>> * Which version of Horizon you installed (master? stable/ussuri, etc.) >>> * Which version of zun-ui you installed (master? stable/ussuri, etc.) >>> * Which operating system you were using? >>> * What are the outputs of the following commands? >>> >>> $ python --version >>> $ pip freeze >>> $ pip3 freeze >>> >>> On Sat, Aug 29, 2020 at 1:04 PM mahdi abbasi < >>> mahdi.abbasi.2013 at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I want install zun ui, first i installed horizon with package and then >>>> install zun ui with pip and then python horizon_path/manage.py collectstatic >>>> But i receive error in httpd finaly. >>>> >>>> Error: >>>> ImportError: No module named zun_ui >>>> >>>> Please help me >>>> >>>> Best Regards >>>> Mahdi >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From iwienand at redhat.com Mon Aug 31 02:27:33 2020 From: iwienand at redhat.com (Ian Wienand) Date: Mon, 31 Aug 2020 12:27:33 +1000 Subject: Setuptools 50 and Devstack Failures [was Re: Setuptools 48 and Devstack Failures] In-Reply-To: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> Message-ID: <20200831022733.GA287001@fedora19.localdomain> On Fri, Jul 03, 2020 at 12:13:04PM -0700, Clark Boylan wrote: > Setuptools has made a new version 48 release. This appears to be > causing problems for devstack because `pip install -e $PACKAGE_PATH` > installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it > did in the past. `pip install $PACKAGE_PATH` continues to install to > /usr/local/bin as expected. Devstack is failing because > keystone-manage cannot currently be found at the specific > /usr/local/bin/ path. This is now back with setuptools 50.0.0 [1], see the original issue [2]. The problems are limited to instances where jobs are installing with pip as root into the system environment on platforms that override the default install path (debuntu). The confluence of this set of requirements of neatly describes most devstack testing :/ There's two visible problems; both stem from the same issue. Packaged Debuntu python installs things into dist-packages; leaving site-packages for a non-packaged interpreter, should you wish to install such a thing. It patches distutils to provide this behaviour. The other thing it does is makes pip installs use /usr/local/bin, rather that /usr/bin. Thus it is unfortunately not just s,/usr/local/bin,/usr/bin,g because the new setuptools will install all the libraries into site-packages; which the packaged python intpreter doesn't know to look for. Using SETUPTOOLS_USE_DISTUTILS=stdlib with such installs is one option; it feels like it just makes for more confusing bifurcation. We can't really set this in stack.sh as a global, because we wouldn't want this to apply to subshells that are installing in virtualenv's, for example. It might be the best option. A more radical thought; perhaps we could install a non-packaged python interpreter for devstack runs. Isolation from packaging cuts both ways; while we might work around packaging issues in CI, we're also working around packaging issues that then just hit you when you're in production. The eternal question of "what are we testing". I don't think there's an easy answer. Which is probably why we've ended up here with everything broken ... -i [1] https://github.com/pypa/setuptools/commit/04e3df22df840c6bb244e9b27bc56750c44b7c85 [2] https://github.com/pypa/setuptools/issues/2232 From johnsomor at gmail.com Mon Aug 31 06:19:58 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Sun, 30 Aug 2020 23:19:58 -0700 Subject: [octavia] usage of SELECT .. FOR UPDATE In-Reply-To: References: Message-ID: Hi Mohammed, Have you opened stories for these issues? I haven't seen any bug reports about this. If not, could you capture your information in stories for us to work against? I am not sure I follow the issue fully, so hopefully we can clarify. Housekeeping, when spares pool is enabled, boot spare amphora VMs. I'm not sure how that could inhibit load balancers from provisioning. Sure, some periodic jobs in the housekeeping process may deadlock and not complete booting spare VMs, but this will not block any load balancer provisioning. If there are no spares available the worker will simply boot a VM as it would normally do without spares enabled (This was functionality we added to Taskflow from the beginning to make sure we didn't have issues blocking load balancers from provisioning if the spares pool was depleted). This lock was added at an operators request as they did not want any "extra" amphora booted beyond the configured spares pool limit. The quota management does lock the project during the critical phase of managing the quota for the project, just like every OpenStack project. If that is not completing the quota update in a timely manner, please open a story with the logs so we can investigate. I assume your application is correctly designed to handle an asynchronous API (such as neutron, Octavia, etc.) and handle any responses that indicate the object is currently immutable and will retry the request. Michael On Sun, Aug 30, 2020 at 8:47 AM Mohammed Naser wrote: > > Hi everyone, > > We're being particularly hit hard across different deployments where > Octavia has several SELECT .. FOR UPDATE queries which are causing > load balancers to fail to provision properly. > > - spare_pools: This usually hits on rolling restarts of o-housekeeping > as they all seem to try to capture a lock -- > https://github.com/openstack/octavia/blob/73fbc05386b512aa1dd86a0ed6e8455cc6b8dc7f/octavia/controller/housekeeping/house_keeping.py#L54 > > - quota: This hits when provisioning a lot of load balancers in > parallel. For example in cases when using Heat -- > https://github.com/openstack/octavia/blob/bf3d5372b9fc670ecd08339fa989c9b738ad8d69/octavia/db/repositories.py#L565-L566 > > These hurt quite a lot in a busy deployment and result in a poor user > experience unfortunately. We're trying to off-load Octavia to it's > own database server but that is more of a "throw power at the problem" > solution. I can imagine that we can probably likely look into a > better/cleaner alternative that avoids this entirely? > > I'm happy to try and push for some of this work on our side. > > Thanks, > Mohammed > > -- > Mohammed Naser > VEXXHOST, Inc. > From iwienand at redhat.com Mon Aug 31 06:46:22 2020 From: iwienand at redhat.com (Ian Wienand) Date: Mon, 31 Aug 2020 16:46:22 +1000 Subject: Setuptools 50 and Devstack Failures [was Re: Setuptools 48 and Devstack Failures] In-Reply-To: <20200831022733.GA287001@fedora19.localdomain> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <20200831022733.GA287001@fedora19.localdomain> Message-ID: <20200831064622.GB287001@fedora19.localdomain> On Mon, Aug 31, 2020 at 12:27:33PM +1000, Ian Wienand wrote: > Thus it is unfortunately not just s,/usr/local/bin,/usr/bin,g because > the new setuptools will install all the libraries into site-packages; > which the packaged python intpreter doesn't know to look for. https://review.opendev.org/748937 was where I tried this before I understood the above. > Using SETUPTOOLS_USE_DISTUTILS=stdlib with such installs is one > option; it feels like it just makes for more confusing bifurcation. > We can't really set this in stack.sh as a global, because we wouldn't > want this to apply to subshells that are installing in virtualenv's, > for example. It might be the best option. https://review.opendev.org/748957 is this option; this should hook into pip_install function. How many plugins do "sudo pip install ..." I don't know; they would all still be broken with this. But as mentioned, we don't want to set this globally to avoid setting it for virtualenv installs. > A more radical thought; perhaps we could install a non-packaged python > interpreter for devstack runs. Isolation from packaging cuts both > ways; while we might work around packaging issues in CI, we're also > working around packaging issues that then just hit you when you're in > production. The eternal question of "what are we testing". On further consideration I don't think this is a great idea. Lots of things do #!/usr/bin/python3 which is always going to be the packaged Python. I imagine we'd have quite a mess of things not understanding which python their libraries are installed for. Another thing that failed was just using the system packaged pip; https://review.opendev.org/748942. In theory that would be OK, and obviously patched correctly for the distro, but unfortunately the bionic pip is so old it doesn't pull down manylinux2010 wheels and so there's assorted build breakages from packages that now have to build. https://review.opendev.org/748943/ is a pin to <50 in requirements. devstack uses requirements to install setuptools in it's tools/install_pip.sh so this does move the system back to a version without this change. Obviously this doesn't fix the underlying problem, but helps the gate. -i From dtantsur at redhat.com Mon Aug 31 09:42:03 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 31 Aug 2020 11:42:03 +0200 Subject: Setuptools 50 and Devstack Failures [was Re: Setuptools 48 and Devstack Failures] In-Reply-To: <20200831022733.GA287001@fedora19.localdomain> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <20200831022733.GA287001@fedora19.localdomain> Message-ID: On Mon, Aug 31, 2020 at 4:32 AM Ian Wienand wrote: > On Fri, Jul 03, 2020 at 12:13:04PM -0700, Clark Boylan wrote: > > Setuptools has made a new version 48 release. This appears to be > > causing problems for devstack because `pip install -e $PACKAGE_PATH` > > installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it > > did in the past. `pip install $PACKAGE_PATH` continues to install to > > /usr/local/bin as expected. Devstack is failing because > > keystone-manage cannot currently be found at the specific > > /usr/local/bin/ path. > > This is now back with setuptools 50.0.0 [1], see the original issue > [2]. > > The problems are limited to instances where jobs are installing with > pip as root into the system environment on platforms that override the > default install path (debuntu). The confluence of this set of > requirements of neatly describes most devstack testing :/ > > There's two visible problems; both stem from the same issue. Packaged > Debuntu python installs things into dist-packages; leaving > site-packages for a non-packaged interpreter, should you wish to > install such a thing. It patches distutils to provide this behaviour. > The other thing it does is makes pip installs use /usr/local/bin, > rather that /usr/bin. > > Thus it is unfortunately not just s,/usr/local/bin,/usr/bin,g because > the new setuptools will install all the libraries into site-packages; > which the packaged python intpreter doesn't know to look for. > > Using SETUPTOOLS_USE_DISTUTILS=stdlib with such installs is one > option; it feels like it just makes for more confusing bifurcation. > We can't really set this in stack.sh as a global, because we wouldn't > want this to apply to subshells that are installing in virtualenv's, > for example. It might be the best option. > > A more radical thought; perhaps we could install a non-packaged python > interpreter for devstack runs. Isolation from packaging cuts both > ways; while we might work around packaging issues in CI, we're also > working around packaging issues that then just hit you when you're in > production. The eternal question of "what are we testing". > > I don't think there's an easy answer. Which is probably why we've > ended up here with everything broken ... > Is it the right time to discuss switching to virtual environments? We've had quite a positive experience with bifrost since we stopped trying global installations at all. Dmitry > > -i > > [1] > https://github.com/pypa/setuptools/commit/04e3df22df840c6bb244e9b27bc56750c44b7c85 > [2] https://github.com/pypa/setuptools/issues/2232 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0ne at e0ne.info Mon Aug 31 12:51:10 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Mon, 31 Aug 2020 15:51:10 +0300 Subject: [horizon] horizon-integration-tests job is broken Message-ID: Hi team, Please, do not trigger recheck if horizon-integration-tests failed like [1]: ______________ TestDashboardHelp.test_dashboard_help_redirection _______________ 'NoneType' object is not iterable While I'm trying to figure out what is happening there [2], any help with troubleshooting is welcome. [1] https://51bb980dc10c72928109-9873e0e5415ff38d9f1a5cc3b1681b19.ssl.cf1.rackcdn.com/744847/2/check/horizon-integration-tests/62ace86/job-output.txt [2] https://review.opendev.org/#/c/749011 Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Mon Aug 31 02:23:38 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Mon, 31 Aug 2020 10:23:38 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200828154741.30cfc1a3.cohuck@redhat.com> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> <20200825163925.1c19b0f0.cohuck@redhat.com> <20200826064117.GA22243@joy-OptiPlex-7040> <20200828154741.30cfc1a3.cohuck@redhat.com> Message-ID: <20200831022338.GA13784@joy-OptiPlex-7040> On Fri, Aug 28, 2020 at 03:47:41PM +0200, Cornelia Huck wrote: > On Wed, 26 Aug 2020 14:41:17 +0800 > Yan Zhao wrote: > > > previously, we want to regard the two mdevs created with dsa-1dwq x 30 and > > dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources. > > > > But, as it's a burden to upper layer, we agree that if this condition > > happens, we still treat the two as incompatible. > > > > To fix it, either the driver should expose dsa-1dwq only, or the target > > dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30. > > AFAIU, these are mdev types, aren't they? So, basically, any management > software needs to take care to use the matching mdev type on the target > system for device creation? dsa-1dwq is the mdev type. there's no dsa-2dwq yet. and I think no dsa-2dwq should be provided in future according to our discussion. GVT currently does not support aggregator also. how to add the the aggregator attribute is currently uder discussion, and up to now it is recommended to be a vendor specific attributes. https://lists.freedesktop.org/archives/intel-gvt-dev/2020-July/006854.html. Thanks Yan From jasowang at redhat.com Mon Aug 31 03:07:53 2020 From: jasowang at redhat.com (Jason Wang) Date: Mon, 31 Aug 2020 11:07:53 +0800 Subject: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices In-Reply-To: <20200821165255.53e26628.cohuck@redhat.com> References: <20200818085527.GB20215@redhat.com> <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200819033035.GA21172@joy-OptiPlex-7040> <20200819065951.GB21172@joy-OptiPlex-7040> <20200819081338.GC21172@joy-OptiPlex-7040> <20200820142740.6513884d.cohuck@redhat.com> <20200821165255.53e26628.cohuck@redhat.com> Message-ID: On 2020/8/21 下午10:52, Cornelia Huck wrote: > On Fri, 21 Aug 2020 11:14:41 +0800 > Jason Wang wrote: > >> On 2020/8/20 下午8:27, Cornelia Huck wrote: >>> On Wed, 19 Aug 2020 17:28:38 +0800 >>> Jason Wang wrote: >>> >>>> On 2020/8/19 下午4:13, Yan Zhao wrote: >>>>> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote: >>>>>> On 2020/8/19 下午2:59, Yan Zhao wrote: >>>>>>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote: >>>>>>>> On 2020/8/19 上午11:30, Yan Zhao wrote: >>>>>>>>> hi All, >>>>>>>>> could we decide that sysfs is the interface that every VFIO vendor driver >>>>>>>>> needs to provide in order to support vfio live migration, otherwise the >>>>>>>>> userspace management tool would not list the device into the compatible >>>>>>>>> list? >>>>>>>>> >>>>>>>>> if that's true, let's move to the standardizing of the sysfs interface. >>>>>>>>> (1) content >>>>>>>>> common part: (must) >>>>>>>>> - software_version: (in major.minor.bugfix scheme) >>>>>>>> This can not work for devices whose features can be negotiated/advertised >>>>>>>> independently. (E.g virtio devices) >>> I thought the 'software_version' was supposed to describe kind of a >>> 'protocol version' for the data we transmit? I.e., you add a new field, >>> you bump the version number. >> >> Ok, but since we mandate backward compatibility of uABI, is this really >> worth to have a version for sysfs? (Searching on sysfs shows no examples >> like this) > I was not thinking about the sysfs interface, but rather about the data > that is sent over while migrating. E.g. we find out that sending some > auxiliary data is a good idea and bump to version 1.1.0; version 1.0.0 > cannot deal with the extra data, but version 1.1.0 can deal with the > older data stream. > > (...) Well, I think what data to transmit during migration is the duty of qemu not kernel. And I suspect the idea of reading opaque data (with version) from kernel and transmit them to dest is the best approach. > >>>>>>>>> - device_api: vfio-pci or vfio-ccw ... >>>>>>>>> - type: mdev type for mdev device or >>>>>>>>> a signature for physical device which is a counterpart for >>>>>>>>> mdev type. >>>>>>>>> >>>>>>>>> device api specific part: (must) >>>>>>>>> - pci id: pci id of mdev parent device or pci id of physical pci >>>>>>>>> device (device_api is vfio-pci)API here. >>>>>>>> So this assumes a PCI device which is probably not true. >>>>>>>> >>>>>>> for device_api of vfio-pci, why it's not true? >>>>>>> >>>>>>> for vfio-ccw, it's subchannel_type. >>>>>> Ok but having two different attributes for the same file is not good idea. >>>>>> How mgmt know there will be a 3rd type? >>>>> that's why some attributes need to be common. e.g. >>>>> device_api: it's common because mgmt need to know it's a pci device or a >>>>> ccw device. and the api type is already defined vfio.h. >>>>> (The field is agreed by and actually suggested by Alex in previous mail) >>>>> type: mdev_type for mdev. if mgmt does not understand it, it would not >>>>> be able to create one compatible mdev device. >>>>> software_version: mgmt can compare the major and minor if it understands >>>>> this fields. >>>> I think it would be helpful if you can describe how mgmt is expected to >>>> work step by step with the proposed sysfs API. This can help people to >>>> understand. >>> My proposal would be: >>> - check that device_api matches >>> - check possible device_api specific attributes >>> - check that type matches [I don't think the combination of mdev types >>> and another attribute to determine compatibility is a good idea; >> >> Any reason for this? Actually if we only use mdev type to detect the >> compatibility, it would be much more easier. Otherwise, we are actually >> re-inventing mdev types. >> >> E.g can we have the same mdev types with different device_api and other >> attributes? > In the end, the mdev type is represented as a string; but I'm not sure > we can expect that two types with the same name, but a different > device_api are related in any way. > > If we e.g. compare vfio-pci and vfio-ccw, they are fundamentally > different. > > I was mostly concerned about the aggregation proposal, where type A + > aggregation value b might be compatible with type B + aggregation value > a. Yes, that looks pretty complicated. > >> >>> actually, the current proposal confuses me every time I look at it] >>> - check that software_version is compatible, assuming semantic >>> versioning >>> - check possible type-specific attributes >> >> I'm not sure if this is too complicated. And I suspect there will be >> vendor specific attributes: >> >> - for compatibility check: I think we should either modeling everything >> via mdev type or making it totally vendor specific. Having something in >> the middle will bring a lot of burden > FWIW, I'm for a strict match on mdev type, and flexibility in per-type > attributes. I'm not sure whether the above flexibility can work better than encoding them to mdev type. If we really want ultra flexibility, we need making the compatibility check totally vendor specific. > >> - for provisioning: it's still not clear. As shown in this proposal, for >> NVME we may need to set remote_url, but unless there will be a subclass >> (NVME) in the mdev (which I guess not), we can't prevent vendor from >> using another attribute name, in this case, tricks like attributes >> iteration in some sub directory won't work. So even if we had some >> common API for compatibility check, the provisioning API is still vendor >> specific ... > Yes, I'm not sure how to deal with the "same thing for different > vendors" problem. We can try to make sure that in-kernel drivers play > nicely, but not much more. Then it's actually a subclass of mdev I guess in the future. Thanks From yan.y.zhao at intel.com Mon Aug 31 04:43:44 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Mon, 31 Aug 2020 12:43:44 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <8f5345be73ebf4f8f7f51d6cdc9c2a0d8e0aa45e.camel@redhat.com> References: <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com> <20200818091628.GC20215@redhat.com> <20200818113652.5d81a392.cohuck@redhat.com> <20200820003922.GE21172@joy-OptiPlex-7040> <20200819212234.223667b3@x1.home> <20200820031621.GA24997@joy-OptiPlex-7040> <20200825163925.1c19b0f0.cohuck@redhat.com> <20200826064117.GA22243@joy-OptiPlex-7040> <20200828154741.30cfc1a3.cohuck@redhat.com> <8f5345be73ebf4f8f7f51d6cdc9c2a0d8e0aa45e.camel@redhat.com> Message-ID: <20200831044344.GB13784@joy-OptiPlex-7040> On Fri, Aug 28, 2020 at 03:04:12PM +0100, Sean Mooney wrote: > On Fri, 2020-08-28 at 15:47 +0200, Cornelia Huck wrote: > > On Wed, 26 Aug 2020 14:41:17 +0800 > > Yan Zhao wrote: > > > > > previously, we want to regard the two mdevs created with dsa-1dwq x 30 and > > > dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources. > > > > > > But, as it's a burden to upper layer, we agree that if this condition > > > happens, we still treat the two as incompatible. > > > > > > To fix it, either the driver should expose dsa-1dwq only, or the target > > > dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30. > > > > AFAIU, these are mdev types, aren't they? So, basically, any management > > software needs to take care to use the matching mdev type on the target > > system for device creation? > > or just do the simple thing of use the same mdev type on the source and dest. > matching mdevtypes is not nessiarly trivial. we could do that but we woudl have > to do that in python rather then sql so it would be slower to do at least today. > > we dont currently have the ablity to say the resouce provider must have 1 of these > set of traits. just that we must have a specific trait. this is a feature we have > disucssed a couple of times and delayed untill we really really need it but its not out > of the question that we could add it for this usecase. i suspect however we would do exact > match first and explore this later after the inital mdev migration works. Yes, I think it's good. still, I'd like to put it more explicitly to make ensure it's not missed: the reason we want to specify compatible_type as a trait and check whether target compatible_type is the superset of source compatible_type is for the consideration of backward compatibility. e.g. an old generation device may have a mdev type xxx-v4-yyy, while a newer generation device may be of mdev type xxx-v5-yyy. with the compatible_type traits, the old generation device is still able to be regarded as compatible to newer generation device even their mdev types are not equal. Thanks Yan > by the way i was looking at some vdpa reslated matiail today and noticed vdpa devices are nolonger > usign mdevs and and now use a vhost chardev so i guess we will need a completely seperate mechanioum > for vdpa vs mdev migration as a result. that is rather unfortunet but i guess that is life. > > > From mnaser at vexxhost.com Mon Aug 31 15:42:54 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 31 Aug 2020 11:42:54 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Move ansible-role-XXX-hsm projects to Barbican team https://review.opendev.org/748027 - Retire the devstack-plugin-zmq project https://review.opendev.org/748731 - Retire devstack-plugin-pika project https://review.opendev.org/748730 - Add openstack-ansible/os_senlin role https://review.opendev.org/748677 - Drop all exceptions for legacy validation https://review.opendev.org/745403 - Add openstack-helm-releases to openstack-helm https://review.opendev.org/748302 - Add assert:supports-standalone. https://review.opendev.org/722399 ## Project Updates - Add etcd3gw to Oslo https://review.opendev.org/747188 ## General Changes - Update and simplify comparison of working groups https://review.opendev.org/746763 - Move towards dual office hours in diff TZ https://review.opendev.org/746167 ## Abandoned Changes - Drop requirement of 1/3 positive TC votes to land https://review.opendev.org/746711 - Move towards single office hour https://review.opendev.org/745200 # Email Threads - vPTG October 2020 Signup: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016497.html Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From kendall at openstack.org Mon Aug 31 17:37:48 2020 From: kendall at openstack.org (Kendall Waters) Date: Mon, 31 Aug 2020 12:37:48 -0500 Subject: vPTG October 2020 Team Signup Reminder Message-ID: <5F13B10F-C0C5-4761-8AD2-9B3A55F67441@openstack.org> Hello Everyone! Wanted to give you all a reminder that the deadline for signing up teams for the PTG is approaching! The virtual PTG will be held from Monday October 26th to Friday October 30th, 2020. To signup your team, you must complete BOTH the survey[1] AND reserve time in the ethercalc[2] by September 11th at 7:00 UTC. We ask that the PTL/SIG Chair/Team lead sign up for time to have their discussions in with 4 rules/guidelines. 1. Cross project discussions (like SIGs or support project teams) should be scheduled towards the start of the week so that any discussions that might shape those of other teams happen first. 2. No team should sign up for more than 4 hours per UTC day to help keep participants actively engaged. 3. No team should sign up for more than 16 hours across all time slots to avoid burning out our contributors and to enable participation in multiple teams discussions. Once your team is signed up, please register[3]! And remind your team to register! Registration is free, but since it will be how we contact you with passwords, event details, etc. it is still important! If you have any questions, please let us know. -The Kendalls (diablo_rojo & wendallkaters) [1] Team Survey: https://openstackfoundation.formstack.com/forms/oct2020_vptg_survey [2] Ethercalc Signup: https://ethercalc.openstack.org/7xp2pcbh1ncb [3] PTG Registration: https://october2020ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Mon Aug 31 19:22:29 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Mon, 31 Aug 2020 21:22:29 +0200 Subject: [TripleO][Ussuri] image prepare, cannot download containers image prepare recently Message-ID: Hi all, I have noticed, that recently my undercloud is not able to download images [0]. I have provided newly generated containers-prepare-parameter.yaml and outputs from container image prepare providing --verbose and later beginning of --debug (in the end) [0] were there any changes? As "openstack tripleo container image prepare default --output-env-file containers-prepare-parameter.yaml --local-push-destination" have prepared a bit different file, compared what was previously: NEW # namespace: docker.io/tripleou VS namespace: docker.io/tripleomaster # OLD [0] - http://paste.openstack.org/show/rBCNAQJBEe9y7CKyi9aG/ -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Aug 31 22:02:46 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 1 Sep 2020 00:02:46 +0200 Subject: [neutron] Team meeting - Tuesday 01.09.2020 Message-ID: <20200831220246.cignba25ghfgqsmf@skaplons-mac> Hi, I will be on PTO on Tuesday and I will not be able to run our team meeting. So lets cancel it this week and see You all on meeting next week. For this week there are 3 important things from me: * please check doodle: https://doodle.com/poll/2ppmnua2nuva5nyp and put there time slots which works the best for You for the PTG in October - please do that this week as next week I need to book some slots for us, * as we are close to the Victoria-3 milestone (next week) which is also feature freeze week, please focus now on reviewin patches for BPs targeted for this cycle: https://wiki.openstack.org/wiki/Network/Meetings#Blueprints * There is new bug deputy role starting this week. Please check new schedule at https://wiki.openstack.org/wiki/Network/Meetings#Bug_deputy and let me know if it don't works for You. That's all from me for this week. See You all online :) -- Slawek Kaplonski Principal software engineer Red Hat From arunkumar.palanisamy at tcs.com Mon Aug 31 19:47:15 2020 From: arunkumar.palanisamy at tcs.com (ARUNKUMAR PALANISAMY) Date: Mon, 31 Aug 2020 19:47:15 +0000 Subject: Trove images for Cluster testing. In-Reply-To: References: Message-ID: Hi Lingxian, Hope you are doing Good. Thank you for your mail and detailed information. We would like to join #openstack-trove IRC channel for discussions. Could you please advise us the process to join IRC channel. We came to know that currently there is no IRC channel meeting happening for Trove, if there is any meeting scheduled and happening. we would like to join and understand the works and progress towards Trove and contribute further. Regards, Arunkumar Palanisamy From: Lingxian Kong Sent: Friday, August 28, 2020 12:09 AM To: ARUNKUMAR PALANISAMY Cc: openstack-discuss at lists.openstack.org; Pravin Mohan Subject: Re: Trove images for Cluster testing. "External email. Open with Caution" Hi Arunkumar, Unfortunately, for now Trove only supports MySQL and MariaDB, I'm working on adding PostgreSQL support. All other datastores are unmaintained right now. Since this(Victoria) dev cycle, docker container was introduced in Trove guest agent in order to remove the maintenance overhead for multiple Trove guest images. We only need to maintain one single guest image but could support different datastores. We have to do that as such a small Trove team in the community. If supporting Redis, Cassandra, MongoDB or Couchbase is in your feature request, you are welcome to contribute to Trove. Please let me know if you have any other questions. You are also welcome to join #openstack-trove IRC channel for discussion. --- Lingxian Kong Senior Software Engineer Catalyst Cloud www.catalystcloud.nz On Fri, Aug 28, 2020 at 6:45 AM ARUNKUMAR PALANISAMY > wrote: Hello Team, My name is ARUNKUMAR PALANISAMY, As part of our project requirement, we are evaluating trove components and need your support for experimental datastore Image for testing cluster. (Redis, Cassandra, MongoDB, Couchbase) 1.) We are running devstack enviorment with Victoria Openstack release and with this image (trove-master-guest-ubuntu-bionic-dev.qcow2), we are able to deploy mysql instance and and getting below error while creating mongoDB instances. “ModuleNotFoundError: No module named 'trove.guestagent.datastore.experimental' “ 2.) While tried creating mongoDB image with diskimage-builder tool, but we are getting “Block device ” element error. Regards, Arunkumar Palanisamy Cell: +49 172 6972490 =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: