From amotoki at gmail.com Wed Jul 1 02:38:51 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 1 Jul 2020 11:38:51 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Wed, Jun 24, 2020 at 10:22 PM Rodolfo Alonso Hernandez wrote: > > Hello all: > > Along this years we have increased the number of DB migrations each time we needed a new DB schema. This is good because that means the project is evolving and adding new features. > > Although this is not a problem per se, there are some inconvenients: > - Every time a system is deployed (for example in the CI using devstack), the initial DB schema is created. Then, each migration is applied sequentially. > - Some FT tests are still checking the sanity of some migrations [1] implemented a few releases ago. > - We are still testing the contract DB migrations. Of course, this is something supported before and we still need to apply those revisions. > - "TestWalkMigrationsMysql" and "TestModelsMigrationsMysql", both using MySQL backend, are still affected by LP#1687027. > > The proposal is to remove some DB migrations, starting from Liberty; of course, because all migrations must be applied in a specific order, we should begin from the initial revision, "kilo". The latest migration to be removed should be decided depending on the stable releases support. > > Apart from mitigating or solving some of the commented problems, we can "group" the DB model definition in one place. E.g.: "subnetpools" table is created in "other_extensions_init_ops". This file contains the first table. However is modified in at least two migrations: > - 1b4c6e320f79_address_scope_support_in_subnetpool: added "address_scope_id" field > - 13cfb89f881a_add_is_default_to_subnetpool: added "is_default" field > > Instead of having (at least) three places where the "subnetpools" DB schema is defined, we can remove the Mitaka migration and group this definition in just one place. > > One possible issue: some migrations add dependencies on other tables. That means the table the dependency is referring should be created in advance. That implies that, in some cases, the table creation order should be modified. That should never affect subsequent created tables or migrations. > > Do you see any inconvenience on this proposal? Am I missing something that I didn't consider? > > Thank you and regards. > > [1]https://github.com/openstack/neutron/blob/9fd60ffaac6b178de62dab169c826d52f7bfbb2d/neutron/tests/functional/db/test_migrations.py Hi, Simplification sounds good in general. Previously (up to Liberty release or some), we squashed all migrationed up to a specific past release. If you look at the git log of neutron/db/migration/alembic_migrations/versions/kilo_initial.py, you can see an example. However, it was stopped as squashing migrations needs to be done very carefully and even if we don't squash migrations the overhead of alembic migrations is not so high. You now raise this again, so it might be time to revisit it, so I am not against your proposal in general. I am not sure what you mean by "remove some DB migrations". Squashing migrations only related to some tables potentially introduces some confusion. A simpler approach looks like to merge all migrations up to a specific release (queens or rocky?). I think this approach addresses the problems you mentioned above. Thought? Akihiro From amotoki at gmail.com Wed Jul 1 02:49:20 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 1 Jul 2020 11:49:20 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: > > Hi, > Simplification sounds good (I do not take into considerations like "no code fanatic movements" or similar). > How this could affect upgrade, I am sure there are deployments older than pike, and those at a point will > got for some newer version (I hope we can give them good answers for their problems as Openstack) > > What do you think about stadium projects? As those have much less activity (as mostly solve one rather specific problem), > and much less migration scripts shall we just "merge" those to init ops? > I checked quickly a few stadium project and only bgpvpn has newer migration scripts than pike. In my understanding, squashing migrations can be done repository by repository. A revision hash of each migration is not changed and head revisions are stored in the database per repository, so it should work. For initial deployments, neutron-db-manage runs all db migrations from the initial revision to a specified revision (release), so it has no problem. For upgrade scenarios, this change just means that we just dropped support upgrade from releases included in squashed migrations. For example, if we squash migrations up to rocky (and create rocky_initial migration) in the neutron repo, we no longer support db migration from releases before rocky. This would be the only difference I see. Thanks, Akihiro > > Regards > Lajos > > Rodolfo Alonso Hernandez ezt írta (időpont: 2020. jún. 24., Sze, 15:25): >> >> Hello all: >> >> Along this years we have increased the number of DB migrations each time we needed a new DB schema. This is good because that means the project is evolving and adding new features. >> >> Although this is not a problem per se, there are some inconvenients: >> - Every time a system is deployed (for example in the CI using devstack), the initial DB schema is created. Then, each migration is applied sequentially. >> - Some FT tests are still checking the sanity of some migrations [1] implemented a few releases ago. >> - We are still testing the contract DB migrations. Of course, this is something supported before and we still need to apply those revisions. >> - "TestWalkMigrationsMysql" and "TestModelsMigrationsMysql", both using MySQL backend, are still affected by LP#1687027. >> >> The proposal is to remove some DB migrations, starting from Liberty; of course, because all migrations must be applied in a specific order, we should begin from the initial revision, "kilo". The latest migration to be removed should be decided depending on the stable releases support. >> >> Apart from mitigating or solving some of the commented problems, we can "group" the DB model definition in one place. E.g.: "subnetpools" table is created in "other_extensions_init_ops". This file contains the first table. However is modified in at least two migrations: >> - 1b4c6e320f79_address_scope_support_in_subnetpool: added "address_scope_id" field >> - 13cfb89f881a_add_is_default_to_subnetpool: added "is_default" field >> >> Instead of having (at least) three places where the "subnetpools" DB schema is defined, we can remove the Mitaka migration and group this definition in just one place. >> >> One possible issue: some migrations add dependencies on other tables. That means the table the dependency is referring should be created in advance. That implies that, in some cases, the table creation order should be modified. That should never affect subsequent created tables or migrations. >> >> Do you see any inconvenience on this proposal? Am I missing something that I didn't consider? >> >> Thank you and regards. >> >> [1]https://github.com/openstack/neutron/blob/9fd60ffaac6b178de62dab169c826d52f7bfbb2d/neutron/tests/functional/db/test_migrations.py >> From skaplons at redhat.com Wed Jul 1 07:39:17 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 1 Jul 2020 09:39:17 +0200 Subject: [neutron][drivers] Propose Rodolfo Alonso Hernandez for Neutron drivers team In-Reply-To: References: <20200623070333.kdvndgypjmuli7um@skaplons-mac> Message-ID: <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> Hi, It is already a week since I sent this nomination and I got only very positive feedback I added Rodolfo to the Neutron drivers team now. Welcome in the drivers Rodolfo and see You on our Friday’s meeting :) > On 25 Jun 2020, at 03:50, Akihiro Motoki wrote: > > +1 from me too. > It would be a great addition to the team. > > --amotoki > > On Tue, Jun 23, 2020 at 4:03 PM Slawek Kaplonski wrote: >> >> Hi, >> >> Rodolfo is very active Neutron contributor since long time. He has wide >> knowledge about all or almost all areas of the Neutron and Neutron stadium >> projects. >> He is an expert e.g. in ovs agent, pyroute and privsep module, openvswitch >> firewall, db layer, OVO and probably many others. He also has very good >> understanding about Neutron project in general, about it's design and >> direction of development. >> >> Rodolfo is also active on our drivers meetings already and I think that his >> feedback about many things there is very good and valuable for the team. >> For all these reasons I think that he will be great addition to our >> drivers team. >> >> I will keep this nomination open for a week waiting for Your feedback and >> votes. >> >> -- >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> > — Slawek Kaplonski Senior software engineer Red Hat From ralonsoh at redhat.com Wed Jul 1 07:58:06 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 1 Jul 2020 08:58:06 +0100 Subject: [neutron][drivers] Propose Rodolfo Alonso Hernandez for Neutron drivers team In-Reply-To: <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> References: <20200623070333.kdvndgypjmuli7um@skaplons-mac> <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> Message-ID: Thank you very much! I'll do my best (Yoda said "there is no try"). Regards. On Wed, Jul 1, 2020 at 8:39 AM Slawek Kaplonski wrote: > Hi, > > It is already a week since I sent this nomination and I got only very > positive feedback I added Rodolfo to the Neutron drivers team now. > Welcome in the drivers Rodolfo and see You on our Friday’s meeting :) > > > On 25 Jun 2020, at 03:50, Akihiro Motoki wrote: > > > > +1 from me too. > > It would be a great addition to the team. > > > > --amotoki > > > > On Tue, Jun 23, 2020 at 4:03 PM Slawek Kaplonski > wrote: > >> > >> Hi, > >> > >> Rodolfo is very active Neutron contributor since long time. He has wide > >> knowledge about all or almost all areas of the Neutron and Neutron > stadium > >> projects. > >> He is an expert e.g. in ovs agent, pyroute and privsep module, > openvswitch > >> firewall, db layer, OVO and probably many others. He also has very good > >> understanding about Neutron project in general, about it's design and > >> direction of development. > >> > >> Rodolfo is also active on our drivers meetings already and I think that > his > >> feedback about many things there is very good and valuable for the team. > >> For all these reasons I think that he will be great addition to our > >> drivers team. > >> > >> I will keep this nomination open for a week waiting for Your feedback > and > >> votes. > >> > >> -- > >> Slawek Kaplonski > >> Senior software engineer > >> Red Hat > >> > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig at stackhpc.com Wed Jul 1 09:57:55 2020 From: stig at stackhpc.com (Stig Telfer) Date: Wed, 1 Jul 2020 10:57:55 +0100 Subject: [scientific-sig] No IRC meeting today Message-ID: Hi All - Unfortunately I am not available to help with today’s Scientific SIG IRC meeting. However, if you haven’t done so already I recommend signing up for the OpenDev virtual event - https://www.openstack.org/events/opendev-2020/ - today it’s bare metal and edge use cases. The last two days have been very useful sessions. Cheers, Stig From ruslanas at lpic.lt Wed Jul 1 12:19:51 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 1 Jul 2020 14:19:51 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi all! Here we go, we are in the second part of this interesting troubleshooting! 1) I have LogTool setup.Thank you Arkady. 2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3] any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider. There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong? [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler wrote: > Hi all! > > I was able to analyze the attached log files and I hope that the results > may help you understand what's going wrong with instance creation. > You can find *Log_Tool's unique exported Error blocks* here: > http://paste.openstack.org/show/795356/ > > *Some statistics and problematical messages:* > ##### Statistics - Number of Errors/Warnings per Standard OSP log since: > 2020-06-30 12:30:00 ##### > Total_Number_Of_Errors --> 9 > /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 > /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 > /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 > > *nova-compute.log* > *default default] Error launching a defined domain with XML: type='kvm'>* > 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager > [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b > 69134106b56941698e58c61... > 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal > *error*: qemu unexpectedly closed the monitor: > 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... > he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set > MSR 0x48e to 0xfff9fffe04006172* > _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. > [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most > recent call last): > 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: > 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File > "/usr/lib/python3.6/site-packages/nova/vir... > > *server.log * > 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} > returned with failed status* > > *ovn_controller.log* > 272-2020-06-30T12:30:10.126079625+02:00 stderr F > 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for > network 'datacentre'* > > Thanks! > > Compute nodes are baremetal or virtualized?, I've seen similar bug reports >>>>>>> when using nested virtualization in other OSes. >>>>>>> >>>>>> baremetal. Dell R630 if to be VERY precise. >>>>> >>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>> relaunched podman container once config was changed. Either way, if I >>>>> understand Linux config correctly, the default value for user and group is >>>>> root, if commented out: >>>>> #user = "root" >>>>> #group = "root" >>>>> >>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) >>>>> and it is really not AMD CPU. >>>>> >>>>> >>>>> Just for fun, it might be important, here is how my node info looks. >>>>> ComputeS01Parameters: >>>>> NovaReservedHostMemory: 16384 >>>>> KernelArgs: "crashkernel=no rhgb" >>>>> ComputeS01ExtraConfig: >>>>> nova::cpu_allocation_ratio: 4.0 >>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>> _______________________________________________ >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbultel at redhat.com Wed Jul 1 15:05:21 2020 From: mbultel at redhat.com (Mathieu Bultel) Date: Wed, 1 Jul 2020 17:05:21 +0200 Subject: [tripleo][validations] new Validation Framework demos Message-ID: Hey TripleO, I have recorded three demos with the new Validation Framework (VF): 1st demo is similar to what Gael did few months ago but with the new code refactored (validations-libs/validations-common projects): https://asciinema.org/a/NRLULghjJa87qxRD9Nfq0FYoa 2nd demo is a use of the VF without any openstack/TripleO prerequisite, on a fresh and empty Ubuntu docker container, with only validations-libs and validations-common projects. It shows that only with a apt-get install git and python3-pip and with a basic python project installation we can run validations and use the framework: https://asciinema.org/a/2Jp9LZbN0xhJAR09zIpI6OpuB So it can answer a few demands such as: How to run validations as prep undercloud installation ? How to run validations on a non-openstack project ? What are the bare minimum requirements for being able to run Validations on a system ? May I run Validation remotely from my machine ? etc... The third one is mainly related to the deployment itself of TripleO. By using a simple PoC (https://review.opendev.org/#/c/724289/), I was able to make TripleO consuming the validations-libs framework and validation logging callback plugin. So it shows in this demo how the deploy steps playbook can be logged, parsed and shown with the VF CLI. This can be improve, modify & so on of course... it's basic usage. https://asciinema.org/a/344484 https://asciinema.org/a/344509 Mathieu. -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.mutel at gmail.com Wed Jul 1 15:34:32 2020 From: samuel.mutel at gmail.com (Samuel Mutel) Date: Wed, 1 Jul 2020 17:34:32 +0200 Subject: [CEILOMETER] Error when sending to prometheus pushgateway Message-ID: Hello, I have two questions about ceilometer (openstack version rocky). - First of all, it seems that ceilometer is sending metrics every hour and I don't understand why. - Next, I am not able to setup ceilometer to send metrics to prometheus pushgateway. Here is my configuration: > sources: > - name: meter_file > interval: 30 > meters: > - "*" > sinks: > - prometheus > > sinks: > - name: prometheus > publishers: > - prometheus://10.60.4.11:9091/metrics/job/ceilometer > Here is the error I received: > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http Traceback > (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", line 178, > in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http raise > HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http HTTPError: > 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Thanks for you help on this topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Wed Jul 1 17:26:47 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Wed, 1 Jul 2020 12:26:47 -0500 Subject: [all][summit][cfp] 2020 Open Infrastructure Summit Call for Presentations Open! Message-ID: <7DC7216B-91E8-4660-B759-788D818BCED5@openstack.org> We’re excited to announce that the Call for Presentations [1] for the 2020 Open Infrastructure Summit is now open until August 4! During the Summit, you’ll be able to join the people building and operating open infrastructure. Submit sessions featuring projects including Airship, Ansible, Ceph, Kata Containers, Kubernetes, ONAP, OpenStack, OPNFV, StarlingX and Zuul! 2020 Tracks • 5G, NFV & Edge • AI, Machine Learning & HPC • CI/CD • Container Infrastructure • Getting Started • Hands-on Workshops • Open Development • Private & Hybrid Cloud • Public Cloud • Security Types of sessions Presentations; demos encouraged Panel Discussions Lightning Talks If your talk is not selected for the official track schedule, we may reach out to have you present it as a Lightning Talk during the Summit in a shorter 10-15 minute format SUBMIT YOUR PRESENTATION [1] - Deadline August 4, 2020 Summit CFP is only for presentation, panel, and workshop submissions. The content submission process for the Forum and Project Teams Gathering (PTG) will be managed separately in the upcoming months. Programming Committee nominations are also now open. The Programming Committee helps select sessions from the CFP for the Summit schedule. Nominate yourself or or someone else for the Programming Committee [2] before July 10, 2020 and help us program the Summit! Registration and sponsorship coming soon! For sponsorship inquiries, please email kendall at openstack.org . Please email speakersupport at openstack.org with any CFP questions or feedback. Thanks, Ashlee [1] cfp.openstack.org [2] https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 Ashlee Ferguson Community & Events Coordinator OpenStack Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Wed Jul 1 19:24:42 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Wed, 1 Jul 2020 15:24:42 -0400 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' Message-ID: Hi all, A short question. I saw a few projects are using the name 'ca_file' [1] as config option, while others are using 'cafile' [2]. I wonder what is the flavorite name convention? I asked this question because Kolla developer suggested Zun to rename from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to confirm if this is a good idea from Keystone's perspective. Thanks. Best regards, Hongbin [1] http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= [2] http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= [3] https://review.opendev.org/#/c/738329/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Wed Jul 1 20:28:45 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 1 Jul 2020 15:28:45 -0500 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: On 7/1/20 2:24 PM, Hongbin Lu wrote: > Hi all, > > A short question. I saw a few projects are using the name 'ca_file' > [1] as config option, while others are using 'cafile' [2]. I wonder > what is the flavorite name convention? > > I asked this question because Kolla developer suggested Zun to rename > from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to > confirm if this is a good idea from Keystone's perspective. Thanks. > > Best regards, > Hongbin > > [1] > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= > [2] > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= > [3] https://review.opendev.org/#/c/738329/ Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, and registry_client_ca_file). From keystone_auth, we do also have cafile. Personally, I find the separation of ca_file to be much easier to read. Sean From whayutin at redhat.com Wed Jul 1 20:49:22 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 14:49:22 -0600 Subject: [tripleo][validations] new Validation Framework demos In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 9:07 AM Mathieu Bultel wrote: > Hey TripleO, > > I have recorded three demos with the new Validation Framework (VF): > 1st demo is similar to what Gael did few months ago but with the new code > refactored (validations-libs/validations-common projects): > https://asciinema.org/a/NRLULghjJa87qxRD9Nfq0FYoa > > 2nd demo is a use of the VF without any openstack/TripleO prerequisite, > on a fresh and empty Ubuntu docker container, with only validations-libs > and validations-common projects. > It shows that only with a apt-get install git and python3-pip and with a > basic python project installation we can run validations and use the > framework: > https://asciinema.org/a/2Jp9LZbN0xhJAR09zIpI6OpuB > > So it can answer a few demands such as: > How to run validations as prep undercloud installation ? > How to run validations on a non-openstack project ? > What are the bare minimum requirements for being able to run > Validations on a system ? May I run Validation remotely from my > machine ? etc... > > The third one is mainly related to the deployment itself of TripleO. > By using a simple PoC (https://review.opendev.org/#/c/724289/), I was > able to make TripleO consuming the validations-libs framework and > validation logging callback plugin. > So it shows in this demo how the deploy steps playbook can be logged, > parsed and shown with the VF CLI. This can be improve, modify & so on of > course... it's basic usage. > https://asciinema.org/a/344484 > https://asciinema.org/a/344509 > > Mathieu. > > Thanks for posting these Mathieu! This helps to visualize some of the topics discussed at the PTG. I like a lot of what I see here and I can see the value it will bring. I have some minor questions about the format of the logs.. like each task has TIMING in bold. Silly stuff like that. Looking forward to looking at this more in depth. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Jul 1 21:45:11 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 1 Jul 2020 17:45:11 -0400 Subject: [cinder] spec freeze now in effect Message-ID: The Cinder spec freeze is now in effect. Here is the rundown on how the specs that have not yet been accepted for Victoria stand: The following specs have a spec freeze exception. Specs must be merged by 1600 UTC on 10 July. (Ideally, you'll have your revisions completed before next week's Cinder meeting on 8 July so we can discuss any issues at the meeting and give you time in case you need to make a final revision.) Remove quota usage cache https://review.opendev.org/#/c/730701/ - need to address some comments on the spec Support modern compression algorithms in cinder backup https://review.opendev.org/#/c/726307/ - needs a requirements change analysis; see comments on the review Reset state robustification https://review.opendev.org/#/c/682456/ - just needs to make moving the "force" option to cinder-manage explicit Default volume type overrides https://review.opendev.org/#/c/733555/ - need some tiem to work out the REST API change more carefully The following spec has been rejected for Victoria, but the team is OK with this being clarified (suggestions are in the comments on the review) and proposed for Wallaby: Support revert any snapshot to the volume https://review.opendev.org/#/c/736111/ The following spec has been rejected for Victoria, but because it's really a bug: volume list query optimization https://review.opendev.org/#/c/726070/ - it's been turned into https://bugs.launchpad.net/cinder/+bug/1885961 and the proposer can submit patches that address the bug. cheers, brian From openstack at nemebean.com Wed Jul 1 21:58:08 2020 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 1 Jul 2020 16:58:08 -0500 Subject: [oslo] PTO on Monday Message-ID: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> Hi Oslo, I'm making this a four day weekend (Friday is a US holiday), so I won't be around for the meeting on Monday. If someone else wants to run it then feel free to hold it without me. Otherwise we'll return to the regular schedule the following week. -Ben From whayutin at redhat.com Thu Jul 2 01:18:10 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 19:18:10 -0600 Subject: [tripleo][ci] status RED In-Reply-To: References: Message-ID: On Mon, Jun 29, 2020 at 10:41 AM Wesley Hayutin wrote: > Greetings, > > Unfortunately both check and gate are RED atm due to [1]. The issue w/ > CirrOS-5.1 was fixed / reverted over the weekend [2]. I expect the check > and gate jobs to continue to be RED for the next few days as the > investigation proceeds. > > I would encourage folks to only workflow patches that are critical as the > chances you will actually merge anything is not great. > > > [1] https://bugs.launchpad.net/tripleo/+bug/1885286 > [2] https://review.opendev.org/#/c/738025/ > OK.. We have finally got our hands on an upstream CentOS-8 node and found the issue w/ retry_attempts. There is an issue w/ CentOS-8, OVS, and os-net-config. We have mitigated the issue by ensuring NetworkManager is disabled before the TripleO install bits start. Still working the issue but I think we're back to green. Thanks Alex and Sagi!!! FYI: https://bugs.launchpad.net/tripleo/+bug/1885286 I'm updating the topic in #tripleo now :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Thu Jul 2 01:21:15 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 19:21:15 -0600 Subject: [tripleo][ci] status RED In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 7:18 PM Wesley Hayutin wrote: > > > On Mon, Jun 29, 2020 at 10:41 AM Wesley Hayutin > wrote: > >> Greetings, >> >> Unfortunately both check and gate are RED atm due to [1]. The issue w/ >> CirrOS-5.1 was fixed / reverted over the weekend [2]. I expect the check >> and gate jobs to continue to be RED for the next few days as the >> investigation proceeds. >> >> I would encourage folks to only workflow patches that are critical as the >> chances you will actually merge anything is not great. >> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1885286 >> [2] https://review.opendev.org/#/c/738025/ >> > > OK.. > We have finally got our hands on an upstream CentOS-8 node and found the > issue w/ retry_attempts. There is an issue w/ CentOS-8, OVS, and > os-net-config. We have mitigated the issue by ensuring NetworkManager is > disabled before the TripleO install bits start. > > Still working the issue but I think we're back to green. Thanks Alex and > Sagi!!! > Also big thanks to the upstream infra folks, clark, fungi and others, for all the debug and extra time they spent w/ the tripleo team!! Much appreciated :) > > FYI: https://bugs.launchpad.net/tripleo/+bug/1885286 > > I'm updating the topic in #tripleo now > > :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 2 07:23:39 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 2 Jul 2020 09:23:39 +0200 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 10:31 PM Sean McGinnis wrote: > > On 7/1/20 2:24 PM, Hongbin Lu wrote: > > Hi all, > > > > A short question. I saw a few projects are using the name 'ca_file' > > [1] as config option, while others are using 'cafile' [2]. I wonder > > what is the flavorite name convention? > > > > I asked this question because Kolla developer suggested Zun to rename > > from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to > > confirm if this is a good idea from Keystone's perspective. Thanks. > > > > Best regards, > > Hongbin > > > > [1] > > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= > > [2] > > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= > > [3] https://review.opendev.org/#/c/738329/ > > Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, > and registry_client_ca_file). > From keystone_auth, we do also have cafile. > > Personally, I find the separation of ca_file to be much easier to read. > > Sean > > Yeah, it was me to suggest the aliasing. We found that the 'cafile' seems more prevalent. We missed that underscore for Zun and scratched our heads "what are we doing wrong there?". Nova has its most interesting because it uses cafile for clients but ca_file for hypervisors 🤷 -yoctozepto From info at dantalion.nl Thu Jul 2 08:37:35 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Thu, 2 Jul 2020 10:37:35 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated Message-ID: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> Hello, the images on docker.io have last been updated 9 months ago https://hub.docker.com/u/loci, I was wondering when do they get updated? As I am currently waiting for the image for Watcher to be available, this image has recently been added as gate job. I require this image in order for the OpenStack helm charts to test https://review.opendev.org/#/c/720140/. Kind regards, Corne lukken From paye600 at gmail.com Thu Jul 2 10:35:59 2020 From: paye600 at gmail.com (Roman Gorshunov) Date: Thu, 2 Jul 2020 12:35:59 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> References: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> Message-ID: <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> Hello Corne, Thank you for your email. i have investigated the issue, and seems that we have image push broken for some time. While we work on resolution, I could advice you to locally build images, if that suits you. I would post a reply here to the mailing list once issue is resolved. Again, thank you for paying attention and informing us. Best regards, Roman Gorshunov From sathlang at redhat.com Thu Jul 2 11:20:48 2020 From: sathlang at redhat.com (Sofer Athlan-Guyot) Date: Thu, 02 Jul 2020 13:20:48 +0200 Subject: [tripleo][update][blueprint] Update refactor: more feedback, more control, more speed. Message-ID: <875zb6noxb.fsf@s390sx.i-did-not-set--mail-host-address--so-tickle-me> Hi, hope you liked the title, I find it catchy. Update is mainly an afterthought that needs to work. So we mainly fix "stuff" there. No major change happened there since a long time. Following the PTG, I'm proposing a new blueprint and a bug: 1. Refactor tripleo update to offer the user more feedback and control[1]. 2. Registering node and repos can happen after some module check for packages[2]. I'm pretty new to this so I would need feedback about the form and content. For instance, point 2. could be a blueprint instead of a bug, tell me what you think. 1. refactor update step to load step playbook instead of looping over the steps: - this will speed up update (no more skipped tasks) - this will offer point of recovery when the update fails (by doing something like in named debug[3] for deployment) 2. refactor/fix? host-prep-tasks to include two steps: - step0 to add pre-update in-flight validation to the update process and rhosp registration; - step1 to all other tasks; - make sure it run in parallel on all nodes Point 1. would be a catch up with deployment. It offers speed improvement as we wouldn't skip tasks anymore. We could notify the user of what we are doing: "I'm removing the node from the cluster" instead of "step1". It would offer the user the hook to be able to restart a failed update from any step. Overall a big win, I think. Point 2. is newer, I filled it as a bug because I bumped into it as an issue when trying to add validation for subscription. It opens some possibilities for the update: - in-flight validation at the beginning of the update process that would be skipped during deployment using tag - using tags we could also run specific day 2 action outside of the update window: openstack overcloud update run --tags 'pre-update-validation' (with pre-update-validation in host-prep-tasks step0) openstack overcloud update run --tags 'rhsm-subscription' Well, it looked promising to me. Now, tell me what you think, but please, be nice, I'm old and susceptible. I have more coming, sorted by order of though I put into it, starting with the ones I though about more: - Check if we need a reboot of the server and notify the user. - Gain some more speed and clarity by having a running-on-all-host-in-parallel-host-update-prep-tasks new step. For instance all HA image tagging magic could go in there. - Investigate converge and check if we still could not further optimize it for update. I would like to gain more experience with the process before I filled those new blueprints. I'm going to draft a spec for the proposed blueprint and then I'll push some WIP code. Thanks, [1] https://blueprints.launchpad.net/tripleo/+spec/tripleo-update-smart-steps [2] https://bugs.launchpad.net/tripleo/+bug/1886028 [1] https://review.opendev.org/#/c/636731/ -- Sofer Athlan-Guyot chem on #irc DFG:Upgrades From ionut at fleio.com Thu Jul 2 12:42:36 2020 From: ionut at fleio.com (Ionut Biru) Date: Thu, 2 Jul 2020 15:42:36 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hello Rafael, Since the merging window for ussuri was long passed for those commits, is it safe to assume that it will not land in stable/ussuri at all and those will be available for victoria? How safe is to cherry pick those commits and use them in production? On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > The dynamic pollster in Ceilometer will be first released in Ussuri. > However, there are some important PRs still waiting for a merge, that might > be important for your use case: > * https://review.opendev.org/#/c/722092/ > * https://review.opendev.org/#/c/715180/ > * https://review.opendev.org/#/c/715289/ > * https://review.opendev.org/#/c/679999/ > * https://review.opendev.org/#/c/709807/ > > > On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves > wrote: > >> >> >> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >> >>> Hello, >>> >>> I want to meter the loadbalancer into gnocchi for billing purposes in >>> stein/train and ceilometer doesn't support dynamic pollsters. >>> >> >> I think I misunderstood your use case, sorry. I read it as if you wanted >> to know "if a loadbalancer was deployed and has status active". >> >> >>> Until I upgrade to Ussuri, is there a way to accomplish this? >>> >> >> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >> Ceilometer project. >> >> >>> >>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves >>> wrote: >>> >>>> Hi Ionut, >>>> >>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>> >>>>> Hello guys, >>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>> following: >>>>> - network.services.lb.active.connections >>>>> - network.services.lb.health_monitor >>>>> - network.services.lb.incoming.bytes >>>>> - network.services.lb.listener >>>>> - network.services.lb.loadbalancer >>>>> - network.services.lb.member >>>>> - network.services.lb.outgoing.bytes >>>>> - network.services.lb.pool >>>>> - network.services.lb.total.connections >>>>> >>>>> But it doesn't work, I think they are for the old lbs that were >>>>> supported in neutron. >>>>> >>>>> I found >>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>> but this is not available in stein or train. >>>>> >>>>> I was wondering if there is a way to meter loadbalancers from octavia. >>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>> and has status active. >>>>> >>>> >>>> You can get the provisioning and operating status of Octavia load >>>> balancers via the Octavia API. There is also an API endpoint that returns >>>> the full load balancer status tree [1]. Additionally, Octavia has >>>> three API endpoints for statistics [2][3][4]. >>>> >>>> I hope this helps with your use case. >>>> >>>> Cheers, >>>> Carlos >>>> >>>> [1] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>> [2] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>> [3] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>> [4] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>> >>>> >>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From opensrloo at gmail.com Thu Jul 2 13:37:00 2020 From: opensrloo at gmail.com (Ruby Loo) Date: Thu, 2 Jul 2020 09:37:00 -0400 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: Hi, On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki wrote: > On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: > > > > Hi, > > Simplification sounds good (I do not take into considerations like "no > code fanatic movements" or similar). > > How this could affect upgrade, I am sure there are deployments older > than pike, and those at a point will > > got for some newer version (I hope we can give them good answers for > their problems as Openstack) > > > > What do you think about stadium projects? As those have much less > activity (as mostly solve one rather specific problem), > > and much less migration scripts shall we just "merge" those to init ops? > > I checked quickly a few stadium project and only bgpvpn has newer > migration scripts than pike. > > In my understanding, squashing migrations can be done repository by > repository. > A revision hash of each migration is not changed and head revisions > are stored in the database per repository, so it should work. > For initial deployments, neutron-db-manage runs all db migrations from > the initial revision to a specified revision (release), so it has no > problem. > For upgrade scenarios, this change just means that we just dropped > support upgrade from releases included in squashed migrations. > For example, if we squash migrations up to rocky (and create > rocky_initial migration) in the neutron repo, we no longer support db > migration from releases before rocky. This would be the only > difference I see. > I wonder if this is acceptable (that an OpenStack service will not support db migrations prior to rocky). What is (or is there?) OpenStack's stance wrt support for upgrades? We are using ocata and plan on upgrading but we don't know when that might happen :-( --ruby -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Thu Jul 2 13:47:42 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Thu, 2 Jul 2020 14:47:42 +0100 Subject: [all][stable] Moving the stable/ocata to 'Unmaintained' phase and then EOL In-Reply-To: <17260946293.bf44dcaa81001.6800161932877911216@ghanshyammann.com> References: <1725c3cbbd0.11d04fc8645393.9035729090460383424@ghanshyammann.com> <762e58c8-44f6-79d0-d674-43becf3eb42a@gmx.com> <17260946293.bf44dcaa81001.6800161932877911216@ghanshyammann.com> Message-ID: <20200702134742.ux3qaqotc4xlgbku@lyarwood.usersys.redhat.com> On 29-05-20 08:17:16, Ghanshyam Mann wrote: > ---- On Fri, 29 May 2020 07:54:05 -0500 Sean McGinnis wrote ---- > > On 5/29/20 6:34 AM, Előd Illés wrote: > > > [snip] > > > > > > TL;DR: If it's not feasible to fix a general issue of a job, then drop > > > that job. And I think we should not EOL Ocata in general, rather let > > > projects EOL their ocata branch if they cannot invest more time on > > > fixing them. > > > > The interdependency is the trick here. Some projects can easily EOL on > > their own and it's isolated enough that it doesn't cause issues. But for > > other projects, like Cinder and Nova that I mentioned, it's kind of an > > all-or-nothing situation. > > > > I suppose it is feasible that we drop testing to only running unit > > tests. If we don't run any kind of integration testing, then it does > > make these projects a little more independent. > > > > We still have the requirements issues though. Unless someone addresses > > any rot in the stable requirements, even unit tests become hard to run. > > > > Just thinking out loud on some of the issues I see. We can try to follow > > the original EM plan and leave it up to each project to declare their > > intent to go EOL, then tag ocata-eol to close it out. Or we can > > collectively decide Ocata is done and pull the big switch. > > From the stable policy if CI has broken nd no maintainer then we can move that > to unmaintained. And there is always time to revert back to EM if the maintainer shows up. > > IMO, maintaining only with unit tests is not a good idea. > > I have not heard from projects that they are interested to maintain it, if any then we can see > how to proceed otherwise collectively marking Ocata as Unmaintained is the right thing. Yup agreed, I'm going to be proposing that we move stable/ocata to unmaintained for openstack/nova at least FWIW, we haven't seen anything of value land there in the last three months: https://review.opendev.org/#/q/project:openstack/nova+branch:stable/ocata Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ruslanas at lpic.lt Thu Jul 2 13:56:05 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 15:56:05 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi All, I have one idea, why it might be the issue. during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages.... How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos.... $ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack at ironic-poc ~]$ can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one? On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: > Hi all! > > Here we go, we are in the second part of this interesting troubleshooting! > > 1) I have LogTool setup.Thank you Arkady. > > 2) I have user OSP to create instance, and I have used virsh to create > instance. > 2.1) OSP way is failing in either way, if it is volume-based or > image-based, it is failing either way.. [1] and [2] > 2.2) when I create it using CLI: [0] [3] > > any ideas what can be wrong? What options I should choose? > I have one network/vlan for whole cloud. I am doing proof of concept of > remote booting, so I do not have br-ex setup. and I do not have br-provider. > > There is my compute[5] and controller[6] yaml files, Please help, how it > should look like so it would have br-ex and br-int connected? as br-int now > is in UNKNOWN state. And br-ex do not exist. > As I understand, in roles data yaml, when we have tag external it should > create br-ex? or am I wrong? > > [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. > [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs > [2] http://paste.openstack.org/show/795431/ < controller logs > [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ > [4] http://paste.openstack.org/show/795433/ < xml file for > [5] > https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml > [6] > https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml > > > On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler > wrote: > >> Hi all! >> >> I was able to analyze the attached log files and I hope that the results >> may help you understand what's going wrong with instance creation. >> You can find *Log_Tool's unique exported Error blocks* here: >> http://paste.openstack.org/show/795356/ >> >> *Some statistics and problematical messages:* >> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: >> 2020-06-30 12:30:00 ##### >> Total_Number_Of_Errors --> 9 >> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >> >> *nova-compute.log* >> *default default] Error launching a defined domain with XML: > type='kvm'>* >> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >> 69134106b56941698e58c61... >> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >> *error*: qemu unexpectedly closed the monitor: >> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set >> MSR 0x48e to 0xfff9fffe04006172* >> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >> recent call last): >> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >> "/usr/lib/python3.6/site-packages/nova/vir... >> >> *server.log * >> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} >> returned with failed status* >> >> *ovn_controller.log* >> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >> network 'datacentre'* >> >> Thanks! >> >> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>> >>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>> >>>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>>> relaunched podman container once config was changed. Either way, if I >>>>>> understand Linux config correctly, the default value for user and group is >>>>>> root, if commented out: >>>>>> #user = "root" >>>>>> #group = "root" >>>>>> >>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) >>>>>> and it is really not AMD CPU. >>>>>> >>>>>> >>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>> ComputeS01Parameters: >>>>>> NovaReservedHostMemory: 16384 >>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>> ComputeS01ExtraConfig: >>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>> _______________________________________________ >>>>>> >>>>>> > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 2 13:58:46 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 15:58:46 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: by the way in CentOS8, here is an error message I receive when searching around [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack at rdo-u ~]$ On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: > Hi All, > > I have one idea, why it might be the issue. > > during image creation step, I have hadd missing packets: > pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs > PCS thing can be found in HA repo, so I enabled it, but > "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... > > I believe that is a case... > so it installed non CentOS8 maintained kvm or some dependent packages.... > > How can I get osops-tools-monitoring-oschecks from centos repos? it is > last seen in CentOS7 repos.... > > $ yum list --enablerepo=* --disablerepo "c7-media" | grep > osops-tools-monitoring-oschecks -A2 > osops-tools-monitoring-oschecks.noarch > 0.0.1-0.20191202171903.bafe3f0.el7 > > rdo-trunk-train-tested > ostree-debuginfo.x86_64 2019.1-2.el7 > base-debuginfo > (undercloud) [stack at ironic-poc ~]$ > > can I somehow not include that package in image creation? OR if it is > essential, can I create a different repo for that one? > > > > > On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: > >> Hi all! >> >> Here we go, we are in the second part of this interesting troubleshooting! >> >> 1) I have LogTool setup.Thank you Arkady. >> >> 2) I have user OSP to create instance, and I have used virsh to create >> instance. >> 2.1) OSP way is failing in either way, if it is volume-based or >> image-based, it is failing either way.. [1] and [2] >> 2.2) when I create it using CLI: [0] [3] >> >> any ideas what can be wrong? What options I should choose? >> I have one network/vlan for whole cloud. I am doing proof of concept of >> remote booting, so I do not have br-ex setup. and I do not have br-provider. >> >> There is my compute[5] and controller[6] yaml files, Please help, how it >> should look like so it would have br-ex and br-int connected? as br-int now >> is in UNKNOWN state. And br-ex do not exist. >> As I understand, in roles data yaml, when we have tag external it should >> create br-ex? or am I wrong? >> >> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. >> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >> [2] http://paste.openstack.org/show/795431/ < controller logs >> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >> [4] http://paste.openstack.org/show/795433/ < xml file for >> [5] >> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >> [6] >> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >> >> >> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >> wrote: >> >>> Hi all! >>> >>> I was able to analyze the attached log files and I hope that the results >>> may help you understand what's going wrong with instance creation. >>> You can find *Log_Tool's unique exported Error blocks* here: >>> http://paste.openstack.org/show/795356/ >>> >>> *Some statistics and problematical messages:* >>> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: >>> 2020-06-30 12:30:00 ##### >>> Total_Number_Of_Errors --> 9 >>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>> >>> *nova-compute.log* >>> *default default] Error launching a defined domain with XML: >> type='kvm'>* >>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>> 69134106b56941698e58c61... >>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>> *error*: qemu unexpectedly closed the monitor: >>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set >>> MSR 0x48e to 0xfff9fffe04006172* >>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>> recent call last): >>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>> "/usr/lib/python3.6/site-packages/nova/vir... >>> >>> *server.log * >>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>> 422} returned with failed status* >>> >>> *ovn_controller.log* >>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>> network 'datacentre'* >>> >>> Thanks! >>> >>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>> >>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>> >>>>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>>>> relaunched podman container once config was changed. Either way, if I >>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>> root, if commented out: >>>>>>> #user = "root" >>>>>>> #group = "root" >>>>>>> >>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>> :) and it is really not AMD CPU. >>>>>>> >>>>>>> >>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>> ComputeS01Parameters: >>>>>>> NovaReservedHostMemory: 16384 >>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>> ComputeS01ExtraConfig: >>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>> _______________________________________________ >>>>>>> >>>>>>> >> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Thu Jul 2 14:05:28 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Thu, 2 Jul 2020 15:05:28 +0100 Subject: [nova][stable] The openstack/nova stable/ocata branch is currently unmaintained Message-ID: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> Hello all, A quick note to highlight that the stable/ocata branch of openstack/nova [1] is formally in the ``Unmaintained`` [2] phase of maintenance will be moved on to the final ``EOL`` phase after a total of 6 months of inactivity. I'm going to suggest that we ignore the following change as this only attempted to remove a job from the experimental queue and doesn't constitute actual maintenance of the branch IMHO. Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfs https://review.opendev.org/#/c/714958/ As a result I consider the branch to have been inactive for 3 of the required 6 months before it can be marked as ``EOL`` [3]. Volunteers are welcome to step forward and attempt to move the branch back to the ``Extended Maintenance`` phase by proposing changes and fixing CI in the next 3 months, otherwise the branch will be marked as ``EOL``. Hopefully this isn't taking anyone by surprise but please let me know if this is going to be an issue! Regards, [1] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/ocata [2] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained [3] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From aschultz at redhat.com Thu Jul 2 14:07:46 2020 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 2 Jul 2020 08:07:46 -0600 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: current-passed-ci is not a valid repo. https://trunk.rdoproject.org/centos8-ussuri/ How are you configuring these repos? On Thu, Jul 2, 2020 at 7:59 AM Ruslanas Gžibovskis wrote: > > by the way in CentOS8, here is an error message I receive when searching around > > [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks > Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': > - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml (IP: 3.87.151.16) > Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried > [stack at rdo-u ~]$ > > On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: >> >> Hi All, >> >> I have one idea, why it might be the issue. >> >> during image creation step, I have hadd missing packets: >> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >> PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >> >> I believe that is a case... >> so it installed non CentOS8 maintained kvm or some dependent packages.... >> >> How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos.... >> >> $ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 >> osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 >> rdo-trunk-train-tested >> ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo >> (undercloud) [stack at ironic-poc ~]$ >> >> can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one? >> >> >> >> >> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: >>> >>> Hi all! >>> >>> Here we go, we are in the second part of this interesting troubleshooting! >>> >>> 1) I have LogTool setup.Thank you Arkady. >>> >>> 2) I have user OSP to create instance, and I have used virsh to create instance. >>> 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] >>> 2.2) when I create it using CLI: [0] [3] >>> >>> any ideas what can be wrong? What options I should choose? >>> I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider. >>> >>> There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. >>> As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong? >>> >>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. >>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >>> [2] http://paste.openstack.org/show/795431/ < controller logs >>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>> [4] http://paste.openstack.org/show/795433/ < xml file for >>> [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>> [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>> >>> >>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler wrote: >>>> >>>> Hi all! >>>> >>>> I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. >>>> You can find Log_Tool's unique exported Error blocks here: http://paste.openstack.org/show/795356/ >>>> >>>> Some statistics and problematical messages: >>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### >>>> Total_Number_Of_Errors --> 9 >>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>> >>>> nova-compute.log >>>> default default] Error launching a defined domain with XML: >>>> 368-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... >>>> 70dc50f] Instance failed to spawn: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0... >>>> he monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 >>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. >>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] Traceback (most recent call last): >>>> 375-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir... >>>> >>>> server.log >>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', 'code': 422} returned with failed status >>>> >>>> ovn_controller.log >>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|Bridge 'br-ex' not found for network 'datacentre' >>>> >>>> Thanks! >>>> >>>>>>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug reports when using nested virtualization in other OSes. >>>>>>>> >>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>> >>>>>>>> Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: >>>>>>>> #user = "root" >>>>>>>> #group = "root" >>>>>>>> >>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU. >>>>>>>> >>>>>>>> >>>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>>> ComputeS01Parameters: >>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>> ComputeS01ExtraConfig: >>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>> _______________________________________________ >>>>>>>> >>> >> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 > > > > -- > Ruslanas Gžibovskis > +370 6030 7030 > _______________________________________________ > users mailing list > users at lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscribe at lists.rdoproject.org From amoralej at redhat.com Thu Jul 2 14:17:56 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 16:17:56 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis wrote: > by the way in CentOS8, here is an error message I receive when searching > around > > [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo > "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks > Errors during downloading metadata for repository > 'rdo-trunk-ussuri-tested': > - Status code: 403 for > https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml > (IP: 3.87.151.16) > Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': > Cannot download repomd.xml: Cannot download repodata/repomd.xml: All > mirrors were tried > [stack at rdo-u ~]$ > > Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it. Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8: https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def > On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: > >> Hi All, >> >> I have one idea, why it might be the issue. >> >> during image creation step, I have hadd missing packets: >> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >> PCS thing can be found in HA repo, so I enabled it, but >> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >> >> I believe that is a case... >> so it installed non CentOS8 maintained kvm or some dependent packages.... >> >> How can I get osops-tools-monitoring-oschecks from centos repos? it is >> last seen in CentOS7 repos.... >> >> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >> osops-tools-monitoring-oschecks -A2 >> osops-tools-monitoring-oschecks.noarch >> 0.0.1-0.20191202171903.bafe3f0.el7 >> >> rdo-trunk-train-tested >> ostree-debuginfo.x86_64 2019.1-2.el7 >> base-debuginfo >> (undercloud) [stack at ironic-poc ~]$ >> >> can I somehow not include that package in image creation? OR if it is >> essential, can I create a different repo for that one? >> >> >> >> >> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >> wrote: >> >>> Hi all! >>> >>> Here we go, we are in the second part of this interesting >>> troubleshooting! >>> >>> 1) I have LogTool setup.Thank you Arkady. >>> >>> 2) I have user OSP to create instance, and I have used virsh to create >>> instance. >>> 2.1) OSP way is failing in either way, if it is volume-based or >>> image-based, it is failing either way.. [1] and [2] >>> 2.2) when I create it using CLI: [0] [3] >>> >>> any ideas what can be wrong? What options I should choose? >>> I have one network/vlan for whole cloud. I am doing proof of concept of >>> remote booting, so I do not have br-ex setup. and I do not have br-provider. >>> >>> There is my compute[5] and controller[6] yaml files, Please help, how it >>> should look like so it would have br-ex and br-int connected? as br-int now >>> is in UNKNOWN state. And br-ex do not exist. >>> As I understand, in roles data yaml, when we have tag external it should >>> create br-ex? or am I wrong? >>> >>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>> running. >>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >>> [2] http://paste.openstack.org/show/795431/ < controller logs >>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>> [4] http://paste.openstack.org/show/795433/ < xml file for >>> [5] >>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>> [6] >>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>> >>> >>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>> wrote: >>> >>>> Hi all! >>>> >>>> I was able to analyze the attached log files and I hope that the >>>> results may help you understand what's going wrong with instance creation. >>>> You can find *Log_Tool's unique exported Error blocks* here: >>>> http://paste.openstack.org/show/795356/ >>>> >>>> *Some statistics and problematical messages:* >>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>> since: 2020-06-30 12:30:00 ##### >>>> Total_Number_Of_Errors --> 9 >>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>> >>>> *nova-compute.log* >>>> *default default] Error launching a defined domain with XML: >>> type='kvm'>* >>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>> 69134106b56941698e58c61... >>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>>> *error*: qemu unexpectedly closed the monitor: >>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>> set MSR 0x48e to 0xfff9fffe04006172* >>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>> recent call last): >>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>> >>>> *server.log * >>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>> 422} returned with failed status* >>>> >>>> *ovn_controller.log* >>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>> network 'datacentre'* >>>> >>>> Thanks! >>>> >>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>> >>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>> >>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>> root, if commented out: >>>>>>>> #user = "root" >>>>>>>> #group = "root" >>>>>>>> >>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>>> :) and it is really not AMD CPU. >>>>>>>> >>>>>>>> >>>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>>> ComputeS01Parameters: >>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>> ComputeS01ExtraConfig: >>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>> >>> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> > > > -- > Ruslanas Gžibovskis > +370 6030 7030 > _______________________________________________ > users mailing list > users at lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscribe at lists.rdoproject.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 2 14:38:04 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 17:38:04 +0300 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be? and your question, "how it can impact kvm": in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong. On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, wrote: > > > On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis > wrote: > >> by the way in CentOS8, here is an error message I receive when searching >> around >> >> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >> Errors during downloading metadata for repository >> 'rdo-trunk-ussuri-tested': >> - Status code: 403 for >> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >> (IP: 3.87.151.16) >> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >> mirrors were tried >> [stack at rdo-u ~]$ >> >> > Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled > by default and not longer usable (i'll send a patch to retire it), don't > enable it. > > Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to > install CentOS8 maintained kvm. BTW, i think that package should not be > required in CentOS8: > > > https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def > > > > >> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >> wrote: >> >>> Hi All, >>> >>> I have one idea, why it might be the issue. >>> >>> during image creation step, I have hadd missing packets: >>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>> PCS thing can be found in HA repo, so I enabled it, but >>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>> >>> I believe that is a case... >>> so it installed non CentOS8 maintained kvm or some dependent packages.... >>> >>> How can I get osops-tools-monitoring-oschecks from centos repos? it is >>> last seen in CentOS7 repos.... >>> >>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>> osops-tools-monitoring-oschecks -A2 >>> osops-tools-monitoring-oschecks.noarch >>> 0.0.1-0.20191202171903.bafe3f0.el7 >>> >>> rdo-trunk-train-tested >>> ostree-debuginfo.x86_64 2019.1-2.el7 >>> base-debuginfo >>> (undercloud) [stack at ironic-poc ~]$ >>> >>> can I somehow not include that package in image creation? OR if it is >>> essential, can I create a different repo for that one? >>> >>> >>> >>> >>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>> wrote: >>> >>>> Hi all! >>>> >>>> Here we go, we are in the second part of this interesting >>>> troubleshooting! >>>> >>>> 1) I have LogTool setup.Thank you Arkady. >>>> >>>> 2) I have user OSP to create instance, and I have used virsh to create >>>> instance. >>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>> image-based, it is failing either way.. [1] and [2] >>>> 2.2) when I create it using CLI: [0] [3] >>>> >>>> any ideas what can be wrong? What options I should choose? >>>> I have one network/vlan for whole cloud. I am doing proof of concept of >>>> remote booting, so I do not have br-ex setup. and I do not have br-provider. >>>> >>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>> it should look like so it would have br-ex and br-int connected? as >>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>> As I understand, in roles data yaml, when we have tag external it >>>> should create br-ex? or am I wrong? >>>> >>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>> running. >>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>> logs >>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>> [5] >>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>> [6] >>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>> >>>> >>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>> wrote: >>>> >>>>> Hi all! >>>>> >>>>> I was able to analyze the attached log files and I hope that the >>>>> results may help you understand what's going wrong with instance creation. >>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>> http://paste.openstack.org/show/795356/ >>>>> >>>>> *Some statistics and problematical messages:* >>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>> since: 2020-06-30 12:30:00 ##### >>>>> Total_Number_Of_Errors --> 9 >>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>> >>>>> *nova-compute.log* >>>>> *default default] Error launching a defined domain with XML: >>>> type='kvm'>* >>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>> 69134106b56941698e58c61... >>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>>>> *error*: qemu unexpectedly closed the monitor: >>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>> recent call last): >>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>>>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>> >>>>> *server.log * >>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>> 422} returned with failed status* >>>>> >>>>> *ovn_controller.log* >>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>> network 'datacentre'* >>>>> >>>>> Thanks! >>>>> >>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>> >>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>> >>>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>>> root, if commented out: >>>>>>>>> #user = "root" >>>>>>>>> #group = "root" >>>>>>>>> >>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>>>> :) and it is really not AMD CPU. >>>>>>>>> >>>>>>>>> >>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>> looks. >>>>>>>>> ComputeS01Parameters: >>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>>> >>>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> >> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> _______________________________________________ >> users mailing list >> users at lists.rdoproject.org >> http://lists.rdoproject.org/mailman/listinfo/users >> >> To unsubscribe: users-unsubscribe at lists.rdoproject.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Thu Jul 2 14:48:05 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 02 Jul 2020 16:48:05 +0200 Subject: [nova][stable] The openstack/nova stable/ocata branch is currently unmaintained In-Reply-To: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> References: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> Message-ID: <3422063.e9J7NaK4W3@whitebase.usersys.redhat.com> On Thursday, 2 July 2020 16:05:28 CEST Lee Yarwood wrote: > Hello all, > > A quick note to highlight that the stable/ocata branch of openstack/nova > [1] is formally in the ``Unmaintained`` [2] phase of maintenance will be > moved on to the final ``EOL`` phase after a total of 6 months of > inactivity. > > I'm going to suggest that we ignore the following change as this only > attempted to remove a job from the experimental queue and doesn't > constitute actual maintenance of the branch IMHO. > > Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfs > https://review.opendev.org/#/c/714958/ The purpose of that change is to remove a job which is going to be removed from cinder too (hopefully) and finally from project-config. If ocata moves to EOL it will be possible to clean that legacy job too, so fine by me! > > As a result I consider the branch to have been inactive for 3 of the > required 6 months before it can be marked as ``EOL`` [3]. > > Volunteers are welcome to step forward and attempt to move the branch > back to the ``Extended Maintenance`` phase by proposing changes and > fixing CI in the next 3 months, otherwise the branch will be marked as > ``EOL``. And if anyone does, make sure to merge my change above :) Ciao -- Luigi From rafaelweingartner at gmail.com Thu Jul 2 14:49:52 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 2 Jul 2020 11:49:52 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: > Since the merging window for ussuri was long passed for those commits, is > it safe to assume that it will not land in stable/ussuri at all and those > will be available for victoria? > I would say so. We are lacking people to review and then merge it. How safe is to cherry pick those commits and use them in production? > As long as the person executing the cherry-picks, and maintaining the code knows what she/he is doing, you should be safe. The guys that are using this implementation (and others that I and my colleagues proposed), have a few openstack components that are customized with the patches/enhancements/extensions we developed so far; this means, they are not using the community version, but something in-between (the community releases + the patches we did). Of course, it is only possible, because we are the ones creating and maintaining these codes; therefore, we can assure quality for production. On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: > Hello Rafael, > > Since the merging window for ussuri was long passed for those commits, is > it safe to assume that it will not land in stable/ussuri at all and those > will be available for victoria? > > How safe is to cherry pick those commits and use them in production? > > > > On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> The dynamic pollster in Ceilometer will be first released in Ussuri. >> However, there are some important PRs still waiting for a merge, that might >> be important for your use case: >> * https://review.opendev.org/#/c/722092/ >> * https://review.opendev.org/#/c/715180/ >> * https://review.opendev.org/#/c/715289/ >> * https://review.opendev.org/#/c/679999/ >> * https://review.opendev.org/#/c/709807/ >> >> >> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >> wrote: >> >>> >>> >>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>> >>>> Hello, >>>> >>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>> >>> >>> I think I misunderstood your use case, sorry. I read it as if you wanted >>> to know "if a loadbalancer was deployed and has status active". >>> >>> >>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>> >>> >>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>> Ceilometer project. >>> >>> >>>> >>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>> cgoncalves at redhat.com> wrote: >>>> >>>>> Hi Ionut, >>>>> >>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>> >>>>>> Hello guys, >>>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>>> following: >>>>>> - network.services.lb.active.connections >>>>>> - network.services.lb.health_monitor >>>>>> - network.services.lb.incoming.bytes >>>>>> - network.services.lb.listener >>>>>> - network.services.lb.loadbalancer >>>>>> - network.services.lb.member >>>>>> - network.services.lb.outgoing.bytes >>>>>> - network.services.lb.pool >>>>>> - network.services.lb.total.connections >>>>>> >>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>> supported in neutron. >>>>>> >>>>>> I found >>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>> but this is not available in stein or train. >>>>>> >>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>> octavia. >>>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>>> and has status active. >>>>>> >>>>> >>>>> You can get the provisioning and operating status of Octavia load >>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>> three API endpoints for statistics [2][3][4]. >>>>> >>>>> I hope this helps with your use case. >>>>> >>>>> Cheers, >>>>> Carlos >>>>> >>>>> [1] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>> [2] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>> [3] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>> [4] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>> >>>>> >>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Jul 2 15:22:30 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 2 Jul 2020 20:52:30 +0530 Subject: [glance] Weekly review priorities Message-ID: Hi Team, We are 3 weeks away from the Victoria milestone 2 release and already our review stack is increasing day by day. We need to get below important specs reviewed and merged in the next couple of weeks. Also we need some reviews on backports as well as important fixes. I have sorted down some patches which need reviews for this week. Specs: 1. sparse image upload - https://review.opendev.org/733157 2. Unified limits - https://review.opendev.org/729187 3. Image encryption - https://review.opendev.org/609667 4. Cinder store multiple stores support - https://review.opendev.org/695152 5. Duplicated image downloads - https://review.opendev.org/734683 6. Add copy-unowned-image spec https://review.opendev.org/739062 Backports: 1. Add lock per share for cinder nfs mount/umount - https://review.opendev.org/#/c/726650/ (stable/train) 2. Add lock per share for cinder nfs mount/umount - https://review.opendev.org/#/c/726914/ (stable/ussuri) 3. zuul: switch to the "plain" grenade job here too - https://review.opendev.org/739056 4. Use grenade-multinode instead of the custom legacy job - https://review.opendev.org/738693 Bug fixes on master: 1. Add image_set_property_atomic() helper - https://review.opendev.org/737868 2. Fix race condition in copy image operation - https://review.opendev.org/737596 3. Don't include plugins on 'copy-image' import - https://review.opendev.org/738675 4. Fix: Interrupted copy-image leaking data on subsequent operation - https://review.opendev.org/737867 Cleanup patches: 1. Removal of 'enable_v2_api' - https://review.opendev.org/#/c/738672/ (review dependency chain as well) Happy reviewing!! Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: From amoralej at redhat.com Thu Jul 2 15:35:07 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 17:35:07 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis wrote: > it is, i have image build failing. i can modify yaml used to create image. > can you remind me which files it would be? > > Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer). As workaround I'd propose you to use the package from: https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ or alternatively applying some local patch to tripleo-puppet-elements. > and your question, "how it can impact kvm": > > in image most of the packages get deployed from deloren repos. I believe > part is from centos repos and part of whole packages in > overcloud-full.qcow2 are from deloren. so it might have bit different minor > version, that might be incompactible... at least it have happend for me > previously with train release so i used tested ci fully from the > beginning... > I might be for sure wrong. > Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in: http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, > wrote: > >> >> >> On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis >> wrote: >> >>> by the way in CentOS8, here is an error message I receive when searching >>> around >>> >>> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >>> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >>> Errors during downloading metadata for repository >>> 'rdo-trunk-ussuri-tested': >>> - Status code: 403 for >>> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >>> (IP: 3.87.151.16) >>> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >>> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >>> mirrors were tried >>> [stack at rdo-u ~]$ >>> >>> >> Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled >> by default and not longer usable (i'll send a patch to retire it), don't >> enable it. >> >> Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead >> to install CentOS8 maintained kvm. BTW, i think that package should not be >> required in CentOS8: >> >> >> https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def >> >> >> >> >>> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >>> wrote: >>> >>>> Hi All, >>>> >>>> I have one idea, why it might be the issue. >>>> >>>> during image creation step, I have hadd missing packets: >>>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>>> PCS thing can be found in HA repo, so I enabled it, but >>>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>>> >>>> I believe that is a case... >>>> so it installed non CentOS8 maintained kvm or some dependent >>>> packages.... >>>> >>>> How can I get osops-tools-monitoring-oschecks from centos repos? it is >>>> last seen in CentOS7 repos.... >>>> >>>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>>> osops-tools-monitoring-oschecks -A2 >>>> osops-tools-monitoring-oschecks.noarch >>>> 0.0.1-0.20191202171903.bafe3f0.el7 >>>> >>>> rdo-trunk-train-tested >>>> ostree-debuginfo.x86_64 2019.1-2.el7 >>>> base-debuginfo >>>> (undercloud) [stack at ironic-poc ~]$ >>>> >>>> can I somehow not include that package in image creation? OR if it is >>>> essential, can I create a different repo for that one? >>>> >>>> >>>> >>>> >>>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> Hi all! >>>>> >>>>> Here we go, we are in the second part of this interesting >>>>> troubleshooting! >>>>> >>>>> 1) I have LogTool setup.Thank you Arkady. >>>>> >>>>> 2) I have user OSP to create instance, and I have used virsh to create >>>>> instance. >>>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>>> image-based, it is failing either way.. [1] and [2] >>>>> 2.2) when I create it using CLI: [0] [3] >>>>> >>>>> any ideas what can be wrong? What options I should choose? >>>>> I have one network/vlan for whole cloud. I am doing proof of concept >>>>> of remote booting, so I do not have br-ex setup. and I do not have >>>>> br-provider. >>>>> >>>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>>> it should look like so it would have br-ex and br-int connected? as >>>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>>> As I understand, in roles data yaml, when we have tag external it >>>>> should create br-ex? or am I wrong? >>>>> >>>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>>> running. >>>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>>> logs >>>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>>> [5] >>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>>> [6] >>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>>> >>>>> >>>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>>> wrote: >>>>> >>>>>> Hi all! >>>>>> >>>>>> I was able to analyze the attached log files and I hope that the >>>>>> results may help you understand what's going wrong with instance creation. >>>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>>> http://paste.openstack.org/show/795356/ >>>>>> >>>>>> *Some statistics and problematical messages:* >>>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>>> since: 2020-06-30 12:30:00 ##### >>>>>> Total_Number_Of_Errors --> 9 >>>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>>> >>>>>> *nova-compute.log* >>>>>> *default default] Error launching a defined domain with XML: >>>>> type='kvm'>* >>>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>>> 69134106b56941698e58c61... >>>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: >>>>>> internal *error*: qemu unexpectedly closed the monitor: >>>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>>> recent call last): >>>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager >>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>>> >>>>>> *server.log * >>>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>>> 422} returned with failed status* >>>>>> >>>>>> *ovn_controller.log* >>>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>>> network 'datacentre'* >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>>> >>>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>>> >>>>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>>>> root, if commented out: >>>>>>>>>> #user = "root" >>>>>>>>>> #group = "root" >>>>>>>>>> >>>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD >>>>>>>>>> CPU :) and it is really not AMD CPU. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>>> looks. >>>>>>>>>> ComputeS01Parameters: >>>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>>> >>>>> >>>> >>>> -- >>>> Ruslanas Gžibovskis >>>> +370 6030 7030 >>>> >>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> _______________________________________________ >>> users mailing list >>> users at lists.rdoproject.org >>> http://lists.rdoproject.org/mailman/listinfo/users >>> >>> To unsubscribe: users-unsubscribe at lists.rdoproject.org >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amoralej at redhat.com Thu Jul 2 16:03:17 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 18:03:17 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 5:35 PM Alfredo Moralejo Alonso wrote: > > > On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis > wrote: > >> it is, i have image build failing. i can modify yaml used to create >> image. can you remind me which files it would be? >> >> > Right, I see that the patch must not be working fine for centos and the > package is being installed from delorean repos in the log. I guess it > needs an entry to cover the centos 8 case (i'm checking with opstools > maintainer). > https://review.opendev.org/739085 > As workaround I'd propose you to use the package from: > > > https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ > > or alternatively applying some local patch to tripleo-puppet-elements. > > >> and your question, "how it can impact kvm": >> >> in image most of the packages get deployed from deloren repos. I believe >> part is from centos repos and part of whole packages in >> overcloud-full.qcow2 are from deloren. so it might have bit different minor >> version, that might be incompactible... at least it have happend for me >> previously with train release so i used tested ci fully from the >> beginning... >> I might be for sure wrong. >> > > Delorean repos contain only OpenStack packages, things like nova, etc... > not kvm or things included in CentOS repos. KVM will always installed which > should be installed from "Advanced Virtualization" repository. May you > check what versions of qemu-kvm and libvirt you got installed into the > overcloud-full image?, it should match with the versions in: > > > http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ > > like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > >> >> On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, >> wrote: >> >>> >>> >>> On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis >>> wrote: >>> >>>> by the way in CentOS8, here is an error message I receive when >>>> searching around >>>> >>>> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >>>> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >>>> Errors during downloading metadata for repository >>>> 'rdo-trunk-ussuri-tested': >>>> - Status code: 403 for >>>> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >>>> (IP: 3.87.151.16) >>>> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >>>> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >>>> mirrors were tried >>>> [stack at rdo-u ~]$ >>>> >>>> >>> Yep, rdo-trunk-ussuri-tested repo included in the release rpm is >>> disabled by default and not longer usable (i'll send a patch to retire it), >>> don't enable it. >>> >>> Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead >>> to install CentOS8 maintained kvm. BTW, i think that package should not be >>> required in CentOS8: >>> >>> >>> https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def >>> >>> >>> >>> >>>> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have one idea, why it might be the issue. >>>>> >>>>> during image creation step, I have hadd missing packets: >>>>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>>>> PCS thing can be found in HA repo, so I enabled it, but >>>>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>>>> >>>>> I believe that is a case... >>>>> so it installed non CentOS8 maintained kvm or some dependent >>>>> packages.... >>>>> >>>>> How can I get osops-tools-monitoring-oschecks from centos repos? it >>>>> is last seen in CentOS7 repos.... >>>>> >>>>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>>>> osops-tools-monitoring-oschecks -A2 >>>>> osops-tools-monitoring-oschecks.noarch >>>>> 0.0.1-0.20191202171903.bafe3f0.el7 >>>>> >>>>> rdo-trunk-train-tested >>>>> ostree-debuginfo.x86_64 2019.1-2.el7 >>>>> base-debuginfo >>>>> (undercloud) [stack at ironic-poc ~]$ >>>>> >>>>> can I somehow not include that package in image creation? OR if it is >>>>> essential, can I create a different repo for that one? >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>>>> wrote: >>>>> >>>>>> Hi all! >>>>>> >>>>>> Here we go, we are in the second part of this interesting >>>>>> troubleshooting! >>>>>> >>>>>> 1) I have LogTool setup.Thank you Arkady. >>>>>> >>>>>> 2) I have user OSP to create instance, and I have used virsh to >>>>>> create instance. >>>>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>>>> image-based, it is failing either way.. [1] and [2] >>>>>> 2.2) when I create it using CLI: [0] [3] >>>>>> >>>>>> any ideas what can be wrong? What options I should choose? >>>>>> I have one network/vlan for whole cloud. I am doing proof of concept >>>>>> of remote booting, so I do not have br-ex setup. and I do not have >>>>>> br-provider. >>>>>> >>>>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>>>> it should look like so it would have br-ex and br-int connected? as >>>>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>>>> As I understand, in roles data yaml, when we have tag external it >>>>>> should create br-ex? or am I wrong? >>>>>> >>>>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>>>> running. >>>>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>>>> logs >>>>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>>>> [5] >>>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>>>> [6] >>>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>>>> >>>>>> >>>>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>>>> wrote: >>>>>> >>>>>>> Hi all! >>>>>>> >>>>>>> I was able to analyze the attached log files and I hope that the >>>>>>> results may help you understand what's going wrong with instance creation. >>>>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>>>> http://paste.openstack.org/show/795356/ >>>>>>> >>>>>>> *Some statistics and problematical messages:* >>>>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>>>> since: 2020-06-30 12:30:00 ##### >>>>>>> Total_Number_Of_Errors --> 9 >>>>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>>>> >>>>>>> *nova-compute.log* >>>>>>> *default default] Error launching a defined domain with XML: >>>>>> type='kvm'>* >>>>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>>>> 69134106b56941698e58c61... >>>>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: >>>>>>> internal *error*: qemu unexpectedly closed the monitor: >>>>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR >>>>>>> 0... >>>>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>>>> recent call last): >>>>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager >>>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>>>> >>>>>>> *server.log * >>>>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>>>> 422} returned with failed status* >>>>>>> >>>>>>> *ovn_controller.log* >>>>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>>>> network 'datacentre'* >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>>>> >>>>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>>>> >>>>>>>>>>> Thank you, I will try. I also modified a file, and it looked >>>>>>>>>>> like it relaunched podman container once config was changed. Either way, if >>>>>>>>>>> I understand Linux config correctly, the default value for user and group >>>>>>>>>>> is root, if commented out: >>>>>>>>>>> #user = "root" >>>>>>>>>>> #group = "root" >>>>>>>>>>> >>>>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD >>>>>>>>>>> CPU :) and it is really not AMD CPU. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>>>> looks. >>>>>>>>>>> ComputeS01Parameters: >>>>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ruslanas Gžibovskis >>>>> +370 6030 7030 >>>>> >>>> >>>> >>>> -- >>>> Ruslanas Gžibovskis >>>> +370 6030 7030 >>>> _______________________________________________ >>>> users mailing list >>>> users at lists.rdoproject.org >>>> http://lists.rdoproject.org/mailman/listinfo/users >>>> >>>> To unsubscribe: users-unsubscribe at lists.rdoproject.org >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Thu Jul 2 16:55:47 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Thu, 2 Jul 2020 11:55:47 -0500 Subject: [Airship-discuss] [2020 Summit] Programming Committee Nominations Open In-Reply-To: <97832365-8405-4277-BFA6-64BB9F9C1F43@openstack.org> References: <97832365-8405-4277-BFA6-64BB9F9C1F43@openstack.org> Message-ID: <576987C3-647F-43C0-AE6E-F4F3C189DCE4@openstack.org> Hi everyone, Just a reminder that Programming Committee nominations for the 2020 Open Infrastructure Summit are open. If you’re an expert in any of the below categories, and would like to help program the Summit content, please fill out this form to nominate yourself or someone else: https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 Thanks! Ashlee Ashlee Ferguson Community & Events Coordinator OpenStack Foundation > On Jun 24, 2020, at 12:06 PM, Ashlee Ferguson wrote: > > Programming Committee nominations for the 2020 Open Infrastructure Summit are open! > > Programming Committees for each Track will help build the Summit schedule, and are made up of individuals working in open infrastructure. Responsibilities include: > • Help the Summit team put together the best possible content based on your subject matter expertise > • Promote the individual Tracks within your networks > • Review the submissions and Community voting results in your particular Track > • Determine if there are any major content gaps in your Track, and if so, potentially solicit additional speakers directly to submit > • Ensure diversity of speakers and companies represented in your Track > • Avoid vendor sales pitches, focusing more on real-world user stories and technical, in-the-trenches experiences > > 2020 Summit Tracks: > • 5G, NFV & Edge > • AI, Machine Learning & HPC > • CI/CD > • Container Infrastructure > • Getting Started > • Hands-on Workshops > • Open Development > • Private & Hybrid Cloud > • Public Cloud > • Security > > If you’re interested in nominating yourself or someone else to be a member of the Summit Programming Committee for a specific Track, please fill out the nomination form[1]. Nominations will close on July 10, 2020. > > NOMINATION FORM[1] > > Programming Committee selections will occur before we open the Call for Presentations (CFP) to receive presentations so that the Committees can host office hours to consult on submissions, and help promote the event. > > The CFP will be open July 1 - August 4, 2020. > > Please email speakersupport at openstack.org with any questions or feedback. > > Cheers, > Ashlee > > [1] https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 > > > Ashlee Ferguson > Community & Events Coordinator > OpenStack Foundation > > > _______________________________________________ > Airship-discuss mailing list > Airship-discuss at lists.airshipit.org > http://lists.airshipit.org/cgi-bin/mailman/listinfo/airship-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jul 2 20:59:43 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 2 Jul 2020 22:59:43 +0200 Subject: [neutron] Drivers meeting agenda - 03.07.2020 Message-ID: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Hi, For tomorrows drivers meeting we have 1 RFE to discuss: https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip port_forwarding] Add port ranges See You all on the meeting tomorrow. — Slawek Kaplonski Senior software engineer Red Hat From miguel at mlavalle.com Thu Jul 2 21:04:04 2020 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 2 Jul 2020 16:04:04 -0500 Subject: [neutron] Drivers meeting agenda - 03.07.2020 In-Reply-To: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> References: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Message-ID: Hi Slawek, Saturday is 4th of July, the US Independence day. Many employers, like mine, are giving us tomorrow off. It may also be the case for the RH members of this team based in the US Cheers Miguel On Thu, Jul 2, 2020 at 3:59 PM Slawek Kaplonski wrote: > Hi, > > For tomorrows drivers meeting we have 1 RFE to discuss: > > https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip > port_forwarding] Add port ranges > > See You all on the meeting tomorrow. > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Jul 2 21:22:57 2020 From: amy at demarco.com (Amy Marrich) Date: Thu, 2 Jul 2020 16:22:57 -0500 Subject: [Diversity] Diversity & Inclusion WG Meeting 7/6 Message-ID: The Diversity & Inclusion WG invites members of all OSF projects to our next meeting Monday, July 6th, at 17:00 UTC in the #openstack-diversity channel. The agenda can be found at https://etherpad.openstack.org/p/diversity -wg-agenda. We will be discussing changing our Wiki page to reflect the broader OSF projects and communities so that the page reflects our mission. Please feel free to add any other topics you wish to discuss at the meeting. Thanks, Amy (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Thu Jul 2 21:45:00 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 14:45:00 -0700 Subject: [TC] [all] OSU Intern Work Message-ID: Hello! As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) I look forward to your suggestions. -Kendall (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Thu Jul 2 21:52:44 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 14:52:44 -0700 Subject: [all][TC] New Office Hours Times Message-ID: Hello! It's been a while since the office hours had been refreshed and we have a lot of new people on the TC that were not around when the times were set. In an effort to stir things up a bit, and get more community engagement, we are picking new times! I want to invite everyone in the community interested in interacting more with the TC to respond to the poll so we have your input as the office hours are really for your benefit anyway. (Nevermind the name of the poll :) Too much work to remake the whole thing just to rename it..) That said, we do need responses from ALL TC members so that we can also document who will (typically) be present for each office hour as well. (Also, thanks Mohammed for putting the poll together! It's no joke. ) -Kendall (diablo_rojo) [1] https://doodle.com/poll/q27t8pucq7b8xbme -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 2 22:08:12 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 02 Jul 2020 17:08:12 -0500 Subject: [TC] [all] OSU Intern Work In-Reply-To: References: Message-ID: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson wrote ---- > Hello! > As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. > It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. > Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) > I look forward to your suggestions. Thanks Kendal for starting this. +100 for OSC help. Also we should consider upstream-investment-opportunities list which is our help needed things in community and we really look for some help on that since starting. For example, help on 'Consistent and Secure Policy Defaults' can be good thing to contribute which is a popup team in this cycle too[2], Raildo and myself can help for mentorship in this. [1] https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html [2] https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies -gmann > -Kendall (diablo_rojo) From kennelson11 at gmail.com Thu Jul 2 22:18:49 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 15:18:49 -0700 Subject: [TC] [all] OSU Intern Work In-Reply-To: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> References: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> Message-ID: On Thu, Jul 2, 2020 at 3:08 PM Ghanshyam Mann wrote: > ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson < > kennelson11 at gmail.com> wrote ---- > > Hello! > > As you may or may not know, the OSF funded a student at Oregon State > University last year to work on OpenStack part time. He did a lot of > amazing work on Glance but sadly we are coming to the end of his internship > as he will be graduating soon. I'm happy to report that we have the budget > to fund another student part time to work on OpenStack again and I wanted > to collect suggestions of projects/areas that a student could be helpful > in. > > It is important to note, that they will only be working part time and, > while I will be helping to mentor them, I will likely need a co-mentor in > the area/topic to help me get them going, get their patches reviewed, > answer questions as they go etc. > > Originally, I had thought about assigning them to Glance (like this > past year) or Designate (like we had considered last year), but now I am > thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a > better fit? If you are interested in helping mentor a student in any of > those areas or have a better idea I am all ears :) > > I look forward to your suggestions. > > Thanks Kendal for starting this. > > +100 for OSC help. > > Also we should consider upstream-investment-opportunities list which is > our help needed things in community and > we really look for some help on that since starting. For example, help on > 'Consistent and Secure Policy Defaults' can > be good thing to contribute which is a popup team in this cycle too[2], > Raildo and myself can help for mentorship in this. > I will definitely take a look at the list, but my understanding was that we wanted someone to work on those things that would be sticking around a little more long term and full time? I can only guarantee the student will be around for the school year and only part time. If I'm wrong, I can definitely rank the policy work a little higher on my list :) > [1] > https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html > [2] > https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies > > -gmann > > > > -Kendall (diablo_rojo) > -Kendall (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Jul 3 07:53:10 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 3 Jul 2020 09:53:10 +0200 Subject: [neutron] Drivers meeting agenda - 03.07.2020 In-Reply-To: References: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Message-ID: <4B3B5595-3BB9-48C0-ACBB-87C3C8A58AD7@redhat.com> Hi, Thx for info Miguel. I didn’t know that You have day off on Friday. I’m not sure if that is the case for others too. Lets see if we will have quorum on the meeting then. If not, we will skip it for this week :) Have a great long weekend :) > On 2 Jul 2020, at 23:04, Miguel Lavalle wrote: > > Hi Slawek, > > Saturday is 4th of July, the US Independence day. Many employers, like mine, are giving us tomorrow off. It may also be the case for the RH members of this team based in the US > > Cheers > > Miguel > > On Thu, Jul 2, 2020 at 3:59 PM Slawek Kaplonski wrote: > Hi, > > For tomorrows drivers meeting we have 1 RFE to discuss: > > https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip port_forwarding] Add port ranges > > See You all on the meeting tomorrow. > > — > Slawek Kaplonski > Senior software engineer > Red Hat > — Slawek Kaplonski Senior software engineer Red Hat From moguimar at redhat.com Fri Jul 3 08:28:27 2020 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Fri, 3 Jul 2020 10:28:27 +0200 Subject: [oslo] PTO on Monday In-Reply-To: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> References: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> Message-ID: Monday will also be a holiday in the Czech Republic. On Wed, Jul 1, 2020 at 11:58 PM Ben Nemec wrote: > Hi Oslo, > > I'm making this a four day weekend (Friday is a US holiday), so I won't > be around for the meeting on Monday. If someone else wants to run it > then feel free to hold it without me. Otherwise we'll return to the > regular schedule the following week. > > -Ben > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronzhu1121 at gmail.com Fri Jul 3 09:08:03 2020 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Fri, 3 Jul 2020 17:08:03 +0800 Subject: [Telemetry] Propose Matthias Runge for Telemetry core In-Reply-To: References: Message-ID: Welcome Matthias, I have added you to the ceilometer core team. Lingxian Kong 于2020年6月24日 周三09:57写道: > +1 welcome! > > --- > Lingxian Kong > Senior Software Engineer > Catalyst Cloud > www.catalystcloud.nz > > > On Tue, Jun 23, 2020 at 11:47 PM Rong Zhu wrote: > >> Hello all, >> >> Matthias Runge have been very active in the repository with patches and >> reviews. >> So I would like to propose adding Matthias as core developer for the >> telemetry project. >> >> Please, feel free to add your votes into the thread. >> -- >> Thanks, >> Rong Zhu >> > -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.mutel at gmail.com Fri Jul 3 09:25:01 2020 From: samuel.mutel at gmail.com (Samuel Mutel) Date: Fri, 3 Jul 2020 11:25:01 +0200 Subject: [Telemetry] Error when sending to prometheus pushgateway Message-ID: Hello, I have two questions about ceilometer (openstack version rocky). - First of all, it seems that ceilometer is sending metrics every hour and I don't understand why. - Next, I am not able to setup ceilometer to send metrics to prometheus pushgateway. Here is my configuration: > sources: > - name: meter_file > interval: 30 > meters: > - "*" > sinks: > - prometheus > > sinks: > - name: prometheus > publishers: > - prometheus://10.60.4.11:9091/metrics/job/ceilometer > Here is the error I received: > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http Traceback > (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", line 178, > in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http raise > HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http HTTPError: > 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Thanks for your help on this topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrunge at matthias-runge.de Fri Jul 3 10:05:24 2020 From: mrunge at matthias-runge.de (Matthias Runge) Date: Fri, 3 Jul 2020 12:05:24 +0200 Subject: [Telemetry] Propose Matthias Runge for Telemetry core In-Reply-To: References: Message-ID: <9e445a50-3d6b-df74-ab69-b218016cbae0@matthias-runge.de> On 03/07/2020 11:08, Rong Zhu wrote: > Welcome Matthias, I have added you to the ceilometer core team. > > Lingxian Kong >于2020 > 年6月24日 周三09:57写道: Thank you, I feel honored. Matthias > > +1 welcome! > > --- > Lingxian Kong > Senior Software Engineer > Catalyst Cloud > www.catalystcloud.nz > > > On Tue, Jun 23, 2020 at 11:47 PM Rong Zhu > wrote: > > Hello all, > > Matthias Runge have been very active in the repository with > patches and reviews. > So I would like to propose adding Matthias as core developer for > the telemetry project. > > Please, feel free to add your votes into the thread. > -- > Thanks, > Rong Zhu > > -- > Thanks, > Rong Zhu From mrunge at matthias-runge.de Fri Jul 3 10:10:29 2020 From: mrunge at matthias-runge.de (Matthias Runge) Date: Fri, 3 Jul 2020 12:10:29 +0200 Subject: [Telemetry] Error when sending to prometheus pushgateway In-Reply-To: References: Message-ID: <731c90df-8830-1804-10a8-a9a97a3e2f55@matthias-runge.de> On 03/07/2020 11:25, Samuel Mutel wrote: > Hello, > > I have two questions about ceilometer (openstack version rocky). > > * First of all, it seems that ceilometer is sending metrics every hour > and I don't understand why. > * Next, I am not able to setup ceilometer to send metrics to > prometheus pushgateway. > > Here is my configuration: > > sources: >   - name: meter_file >     interval: 30 >     meters: >       - "*" >     sinks: >       - prometheus > > sinks: >   - name: prometheus >     publishers: >             - prometheus://10.60.4.11:9091/metrics/job/ceilometer > > > > Here is the error I received: > > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} > 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Traceback (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http   File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", > line 178, in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http     > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http   File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http     > raise HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > > > Thanks for your help on this topic. Hi, first obvious question: are you sure that there is something listening under http://10.60.4.11:9091/metrics/job/ceilometer ? Would you have some error logs from the other side? It seems that ceilometer is trying to dispatch as expected. Matthias From anost1986 at gmail.com Fri Jul 3 11:20:20 2020 From: anost1986 at gmail.com (Andrii Ostapenko) Date: Fri, 3 Jul 2020 06:20:20 -0500 Subject: [loci][helm][k8s] When do images on docker.io get updated Message-ID: Hello Corne, OSH uses images built using gates in openstack/openstack-helm-images repository, not in loci itself. You may want to add a definition for watcher image similar to [0] and then refer to it in the corresponding release job, e.g. for Stein [1]. After your commit to openstack-helm-images is merged, new images will be published to docker.io/openstackhelm/watcher repository and can be used in the way you referenced them in your OSH commit [2]. [0] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L269-L279 [1] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L454-L481 [2] https://review.opendev.org/#/c/720140/ From info at dantalion.nl Fri Jul 3 12:34:28 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 3 Jul 2020 14:34:28 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: References: Message-ID: Hello Andrii, I understand, this is unfortunate however as when I previously asked I was told that it could be achieved both using loci or openstack-helm-images. Seeing how the loci patch took around 7 months to merge I have now faced quite some delays. I will submit the patch to openstack-helm-images soon, thanks for clarifying. PS: Octavia is still using loci images for openstack-helm is that something that should be updated? https://opendev.org/openstack/openstack-helm/src/branch/master/octavia/values.yaml#L54 King regards, Corne Lukken On 03-07-2020 13:20, Andrii Ostapenko wrote: > Hello Corne, > > OSH uses images built using gates in openstack/openstack-helm-images > repository, not in loci itself. You may want to add a definition for > watcher image similar to [0] and then refer to it in the corresponding > release job, e.g. for Stein [1]. > > After your commit to openstack-helm-images is merged, new images will > be published to docker.io/openstackhelm/watcher repository and can be > used in the way you referenced them in your OSH commit [2]. > > [0] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L269-L279 > [1] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L454-L481 > [2] https://review.opendev.org/#/c/720140/ > From amotoki at gmail.com Fri Jul 3 13:39:28 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Fri, 3 Jul 2020 22:39:28 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 10:37 PM Ruby Loo wrote: > > Hi, > > On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki wrote: >> >> On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: >> > >> > Hi, >> > Simplification sounds good (I do not take into considerations like "no code fanatic movements" or similar). >> > How this could affect upgrade, I am sure there are deployments older than pike, and those at a point will >> > got for some newer version (I hope we can give them good answers for their problems as Openstack) >> > >> > What do you think about stadium projects? As those have much less activity (as mostly solve one rather specific problem), >> > and much less migration scripts shall we just "merge" those to init ops? >> > I checked quickly a few stadium project and only bgpvpn has newer migration scripts than pike. >> >> In my understanding, squashing migrations can be done repository by repository. >> A revision hash of each migration is not changed and head revisions >> are stored in the database per repository, so it should work. >> For initial deployments, neutron-db-manage runs all db migrations from >> the initial revision to a specified revision (release), so it has no >> problem. >> For upgrade scenarios, this change just means that we just dropped >> support upgrade from releases included in squashed migrations. >> For example, if we squash migrations up to rocky (and create >> rocky_initial migration) in the neutron repo, we no longer support db >> migration from releases before rocky. This would be the only >> difference I see. > > > > I wonder if this is acceptable (that an OpenStack service will not support db migrations prior to rocky). What is (or is there?) OpenStack's stance wrt support for upgrades? We are using ocata and plan on upgrading but we don't know when that might happen :-( > > --ruby It is not true. What we the upstream community recommend is to upgrade the controller node and databases in the fast-foward upgrade manner. Even if the upstream repository just provides database migration from for example Rocky, you can upgrade from a release older than rocky, by upgrading one release by one. In addition, by keeping a specific number of releases in db migrations, operators can still upgrade from more than one old release (if they want). --amotoki From gmann at ghanshyammann.com Fri Jul 3 14:48:50 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 03 Jul 2020 09:48:50 -0500 Subject: [TC] [all] OSU Intern Work In-Reply-To: References: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> Message-ID: <1731526ca3b.11b1ddef7398808.2475388787302983606@ghanshyammann.com> ---- On Thu, 02 Jul 2020 17:18:49 -0500 Kendall Nelson wrote ---- > > > On Thu, Jul 2, 2020 at 3:08 PM Ghanshyam Mann wrote: > ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson wrote ---- > > Hello! > > As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. > > It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. > > Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) > > I look forward to your suggestions. > > Thanks Kendal for starting this. > > +100 for OSC help. > > Also we should consider upstream-investment-opportunities list which is our help needed things in community and > we really look for some help on that since starting. For example, help on 'Consistent and Secure Policy Defaults' can > be good thing to contribute which is a popup team in this cycle too[2], Raildo and myself can help for mentorship in this. > > I will definitely take a look at the list, but my understanding was that we wanted someone to work on those things that would be sticking around a little more long term and full time? I can only guarantee the student will be around for the school year and only part time. > If I'm wrong, I can definitely rank the policy work a little higher on my list :) Thanks, part time help will be valuable too in policy work. For example, doing it for 1-2 projects (who has small set of policies) can be good progress. -gmann > > [1] https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html > [2] https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies > > -gmann > > > > -Kendall (diablo_rojo) > > -Kendall (diablo_rojo) From ildiko.vancsa at gmail.com Fri Jul 3 14:52:50 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Fri, 3 Jul 2020 16:52:50 +0200 Subject: [cyborg] Incomplete v2 API in Train Message-ID: <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Hi Cyborg Team, I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? Thanks, Ildikó [1] https://www.lfnetworking.org/about/cntt/ [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 From mnaser at vexxhost.com Fri Jul 3 16:05:21 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 3 Jul 2020 12:05:21 -0400 Subject: [TC] Monthly Meeting Summary Message-ID: Hi everyone, Here’s a summary of what happened in our TC monthly meeting last Thursday, July 2nd. # ATTENDEES (LINES SAID) - mnaser (106) - gmann (43) - evrardjp (32) - diablo_rojo (32) - njohnston (11) - jungleboyj (8) - openstack (7) - ricolin (6) - fungi (4) - ttx (2) - clarkb (2) - belmoreira (1) - knikolla (1) - AJaeger (1) # MEETING SUMMARY - Rollcall (mnaser, 14:00:40) - Follow up on past action items (mnaser, 14:04:59) - OpenStack Foundation OSU Intern Project (diablo_rojo) (mnaser, 14:26:55) - W cycle goal selection start (mnaser, 14:36:02) - https://governance.openstack.org/tc/goals/#goal-selection-schedule (gmann, 14:37:37) - Completion of retirement cleanup (gmann) (mnaser, 14:48:36) - https://etherpad.opendev.org/p/tc-retirement-cleanup is a scratch pad; nothing pushed out towards the community (mnaser, 14:48:59) # ACTION ITEMS - evrardjp & njohnston to start writing resolution about how deconstructed PTL role - mnaser to find the owner to start using facing API pop-up team over ML - gmann update goal selection docs to clarify the goal count - gmann start discussion around reviewing currenet tags - mnaser propose change to implement weekly meetings - diablo_rojo start discussion on ML around potential items for OSF funded intern - njohnston and mugsie to work on getting goals groomed/proposed for W cycle - TC and community to help finish properly and cleanly retiring projects To read the full logs of the meeting, please refer to http://eavesdrop.openstack.org/meetings/tc/2020/tc.2020-07-02-14.00.log.html. -- Mohammed Naser VEXXHOST, Inc. From ionut at fleio.com Fri Jul 3 16:19:29 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 3 Jul 2020 19:19:29 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi Rafael, I think I applied all the reviews successfully but I tried to do an octavia dynamic poller but I have couples of errors. Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ if i remove the - in front of name like this: https://paste.xinu.at/K7s5I8/ The error is different this time: https://paste.xinu.at/zWdC0U/ Is there something I missed or is something wrong in yaml? On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > > Since the merging window for ussuri was long passed for those commits, is >> it safe to assume that it will not land in stable/ussuri at all and those >> will be available for victoria? >> > > I would say so. We are lacking people to review and then merge it. > > How safe is to cherry pick those commits and use them in production? >> > As long as the person executing the cherry-picks, and maintaining the code > knows what she/he is doing, you should be safe. The guys that are using > this implementation (and others that I and my colleagues proposed), have a > few openstack components that are customized with the > patches/enhancements/extensions we developed so far; this means, they are > not using the community version, but something in-between (the community > releases + the patches we did). Of course, it is only possible, because we > are the ones creating and maintaining these codes; therefore, we can assure > quality for production. > > > > > On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: > >> Hello Rafael, >> >> Since the merging window for ussuri was long passed for those commits, is >> it safe to assume that it will not land in stable/ussuri at all and those >> will be available for victoria? >> >> How safe is to cherry pick those commits and use them in production? >> >> >> >> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>> However, there are some important PRs still waiting for a merge, that might >>> be important for your use case: >>> * https://review.opendev.org/#/c/722092/ >>> * https://review.opendev.org/#/c/715180/ >>> * https://review.opendev.org/#/c/715289/ >>> * https://review.opendev.org/#/c/679999/ >>> * https://review.opendev.org/#/c/709807/ >>> >>> >>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >>> wrote: >>> >>>> >>>> >>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>> >>>>> Hello, >>>>> >>>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>>> >>>> >>>> I think I misunderstood your use case, sorry. I read it as if you >>>> wanted to know "if a loadbalancer was deployed and has status active". >>>> >>>> >>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>> >>>> >>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>> Ceilometer project. >>>> >>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>> cgoncalves at redhat.com> wrote: >>>>> >>>>>> Hi Ionut, >>>>>> >>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>>> >>>>>>> Hello guys, >>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>>>> following: >>>>>>> - network.services.lb.active.connections >>>>>>> - network.services.lb.health_monitor >>>>>>> - network.services.lb.incoming.bytes >>>>>>> - network.services.lb.listener >>>>>>> - network.services.lb.loadbalancer >>>>>>> - network.services.lb.member >>>>>>> - network.services.lb.outgoing.bytes >>>>>>> - network.services.lb.pool >>>>>>> - network.services.lb.total.connections >>>>>>> >>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>> supported in neutron. >>>>>>> >>>>>>> I found >>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>> but this is not available in stein or train. >>>>>>> >>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>> octavia. >>>>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>>>> and has status active. >>>>>>> >>>>>> >>>>>> You can get the provisioning and operating status of Octavia load >>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>> three API endpoints for statistics [2][3][4]. >>>>>> >>>>>> I hope this helps with your use case. >>>>>> >>>>>> Cheers, >>>>>> Carlos >>>>>> >>>>>> [1] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>> [2] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>> [3] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>> [4] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jul 3 16:32:55 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 3 Jul 2020 16:32:55 +0000 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: References: Message-ID: <20200703163255.rlotrtjbwjlxwt4o@yuggoth.org> On 2020-07-03 14:34:28 +0200 (+0200), info at dantalion.nl wrote: [...] > I understand, this is unfortunate however as when I previously > asked I was told that it could be achieved both using loci or > openstack-helm-images. Seeing how the loci patch took around 7 > months to merge I have now faced quite some delays. > > I will submit the patch to openstack-helm-images soon, thanks for > clarifying. [...] Be aware that the loci team basically dissolved a year or two back and development mostly ground to a halt. The loci deliverable was folded into the openstack-helm team a couple months ago because they still depend on it in some places, so they've committed to keep it on life support for now. At this point I would assume whatever the openstack-helm team is focusing on will receive better support than loci, unless they're actively working to use loci more. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ionut at fleio.com Fri Jul 3 16:59:37 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 3 Jul 2020 19:59:37 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi, I just noticed that the example dynamic.network.services.vpn.connection from https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has the wrong indentation. This https://paste.xinu.at/6PTfsM/ is loaded without any error. Now I have to see why is not polling from it On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: > Hi Rafael, > > I think I applied all the reviews successfully but I tried to do an > octavia dynamic poller but I have couples of errors. > > Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ > Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ > > if i remove the - in front of name like this: > https://paste.xinu.at/K7s5I8/ > The error is different this time: https://paste.xinu.at/zWdC0U/ > > Is there something I missed or is something wrong in yaml? > > > On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> >> Since the merging window for ussuri was long passed for those commits, is >>> it safe to assume that it will not land in stable/ussuri at all and those >>> will be available for victoria? >>> >> >> I would say so. We are lacking people to review and then merge it. >> >> How safe is to cherry pick those commits and use them in production? >>> >> As long as the person executing the cherry-picks, and maintaining the >> code knows what she/he is doing, you should be safe. The guys that are >> using this implementation (and others that I and my colleagues proposed), >> have a few openstack components that are customized with the >> patches/enhancements/extensions we developed so far; this means, they are >> not using the community version, but something in-between (the community >> releases + the patches we did). Of course, it is only possible, because we >> are the ones creating and maintaining these codes; therefore, we can assure >> quality for production. >> >> >> >> >> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >> >>> Hello Rafael, >>> >>> Since the merging window for ussuri was long passed for those commits, >>> is it safe to assume that it will not land in stable/ussuri at all and >>> those will be available for victoria? >>> >>> How safe is to cherry pick those commits and use them in production? >>> >>> >>> >>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>> However, there are some important PRs still waiting for a merge, that might >>>> be important for your use case: >>>> * https://review.opendev.org/#/c/722092/ >>>> * https://review.opendev.org/#/c/715180/ >>>> * https://review.opendev.org/#/c/715289/ >>>> * https://review.opendev.org/#/c/679999/ >>>> * https://review.opendev.org/#/c/709807/ >>>> >>>> >>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >>>> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>>>> >>>>> >>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>> >>>>> >>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>> >>>>> >>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>>> Ceilometer project. >>>>> >>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>> cgoncalves at redhat.com> wrote: >>>>>> >>>>>>> Hi Ionut, >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello guys, >>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>> the following: >>>>>>>> - network.services.lb.active.connections >>>>>>>> - network.services.lb.health_monitor >>>>>>>> - network.services.lb.incoming.bytes >>>>>>>> - network.services.lb.listener >>>>>>>> - network.services.lb.loadbalancer >>>>>>>> - network.services.lb.member >>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>> - network.services.lb.pool >>>>>>>> - network.services.lb.total.connections >>>>>>>> >>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>> supported in neutron. >>>>>>>> >>>>>>>> I found >>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>> but this is not available in stein or train. >>>>>>>> >>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>> octavia. >>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>> deployed and has status active. >>>>>>>> >>>>>>> >>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>> >>>>>>> I hope this helps with your use case. >>>>>>> >>>>>>> Cheers, >>>>>>> Carlos >>>>>>> >>>>>>> [1] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>> [2] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>> [3] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>> [4] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Fri Jul 3 19:13:04 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 03 Jul 2020 12:13:04 -0700 Subject: Setuptools 48 and Devstack Failures Message-ID: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> Hello, Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. Clark From rafaelweingartner at gmail.com Fri Jul 3 22:09:40 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 3 Jul 2020 19:09:40 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Good catch. I fixed the docs. https://review.opendev.org/#/c/739288/ On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: > Hi, > > I just noticed that the example dynamic.network.services.vpn.connection > from > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has > the wrong indentation. > This https://paste.xinu.at/6PTfsM/ is loaded without any error. > > Now I have to see why is not polling from it > > On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: > >> Hi Rafael, >> >> I think I applied all the reviews successfully but I tried to do an >> octavia dynamic poller but I have couples of errors. >> >> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >> >> if i remove the - in front of name like this: >> https://paste.xinu.at/K7s5I8/ >> The error is different this time: https://paste.xinu.at/zWdC0U/ >> >> Is there something I missed or is something wrong in yaml? >> >> >> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> >>> Since the merging window for ussuri was long passed for those commits, >>>> is it safe to assume that it will not land in stable/ussuri at all and >>>> those will be available for victoria? >>>> >>> >>> I would say so. We are lacking people to review and then merge it. >>> >>> How safe is to cherry pick those commits and use them in production? >>>> >>> As long as the person executing the cherry-picks, and maintaining the >>> code knows what she/he is doing, you should be safe. The guys that are >>> using this implementation (and others that I and my colleagues proposed), >>> have a few openstack components that are customized with the >>> patches/enhancements/extensions we developed so far; this means, they are >>> not using the community version, but something in-between (the community >>> releases + the patches we did). Of course, it is only possible, because we >>> are the ones creating and maintaining these codes; therefore, we can assure >>> quality for production. >>> >>> >>> >>> >>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>> >>>> Hello Rafael, >>>> >>>> Since the merging window for ussuri was long passed for those commits, >>>> is it safe to assume that it will not land in stable/ussuri at all and >>>> those will be available for victoria? >>>> >>>> How safe is to cherry pick those commits and use them in production? >>>> >>>> >>>> >>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>> However, there are some important PRs still waiting for a merge, that might >>>>> be important for your use case: >>>>> * https://review.opendev.org/#/c/722092/ >>>>> * https://review.opendev.org/#/c/715180/ >>>>> * https://review.opendev.org/#/c/715289/ >>>>> * https://review.opendev.org/#/c/679999/ >>>>> * https://review.opendev.org/#/c/709807/ >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>> cgoncalves at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>> >>>>>> >>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>> >>>>>> >>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>> >>>>>> >>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>>>> Ceilometer project. >>>>>> >>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>> cgoncalves at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Ionut, >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello guys, >>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>> the following: >>>>>>>>> - network.services.lb.active.connections >>>>>>>>> - network.services.lb.health_monitor >>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>> - network.services.lb.listener >>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>> - network.services.lb.member >>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>> - network.services.lb.pool >>>>>>>>> - network.services.lb.total.connections >>>>>>>>> >>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>> supported in neutron. >>>>>>>>> >>>>>>>>> I found >>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>> but this is not available in stein or train. >>>>>>>>> >>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>> octavia. >>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>> deployed and has status active. >>>>>>>>> >>>>>>>> >>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>>> >>>>>>>> I hope this helps with your use case. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Carlos >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>> [2] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>> [3] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>> [4] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jul 3 22:29:18 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 03 Jul 2020 17:29:18 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> Message-ID: <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > Hello, > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 This is patch to use old setuptools- - https://review.opendev.org/#/c/739290/ > > Clark > > From gmann at ghanshyammann.com Sun Jul 5 01:24:55 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 04 Jul 2020 20:24:55 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> Message-ID: <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> ---- On Fri, 03 Jul 2020 17:29:18 -0500 Ghanshyam Mann wrote ---- > ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > > Hello, > > > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. > > Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. > As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track > it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 > > This is patch to use old setuptools- > - https://review.opendev.org/#/c/739290/ Updates: Issue is when setuptools adopts distutils from the standard library (in 48.0.0) and uses it, downstream packagers customization to distutils will be lost. - https://github.com/pypa/setuptools/issues/2232 setuptools 49.1.0 reverted the adoption of distutils from the standard library and its working now. I have closed the devstack bug 1886237 and proposed the revert of capping of setuptools by blacklisting 48.0.0 and 49.0.0 so that we test with latest setuptools. For now, devstack will pick the 49.1.0 and pass. - https://review.opendev.org/#/c/739294/2 In summary, gate is green and you can recheck on the failed patches. -gmann > > > > > Clark > > > > > > From gmann at ghanshyammann.com Sun Jul 5 18:36:48 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 05 Jul 2020 13:36:48 -0500 Subject: [all][tc][goals] Migrate CI/CD jobs to new Ubuntu LTS Focal: Week R-15 Update Message-ID: <17320443796.124ad0a06441329.7461862692885839223@ghanshyammann.com> Hello Everyone, Please find the week R-15 updates on 'Ubuntu Focal migration' community goal. Tracking: https://storyboard.openstack.org/#!/story/2007865 Progress: ======= * I have prepared the patched to migrate the unit/functional/doc/cover tox jobs to focal which are WIP till we finish the project side testing. This and its base patches - https://review.opendev.org/#/c/738328/ * devstack and tempest base patches are changed with Depends-On on 738328 tox job patch. This way we can test the complete gate (integration + unit +functional + doc + cover +pep8 + lower-constraint) jobs with a single testing patch by doing Depends-On: https://review.opendev.org/#/c/734700/ (or devstack base patch or tox one if you do not have tempest jobs to test) * I have started a few more project testing and found bugs on the incompatible deps versions for Focal. Please refer to the 'Bugs Report' section for details. Bugs Report: ========== Summary: Total 4 (1 fixed, 3 in-progress). 1. Bug#1882521. (IN-PROGRESS) There is open bug for nova/cinder where three tempest tests are failing for volume detach operation. There is no clear root cause found yet -https://bugs.launchpad.net/cinder/+bug/1882521 We have skipped the tests in tempest base patch to proceed with the other projects testing but this is blocking things for the migration. 2. We encountered the nodeset name conflict with x/tobiko. (FIXED) nodeset conflict is resolved now and devstack provides all focal nodes now. 3. Bug#1886296. (IN-PROGRESS) pyflakes till 2.1.0 is not compatible with python 3.8 which is the default python version on ubuntu focal[1]. With pep8 job running on focal faces the issue and fail. We need to bump the pyflakes to 2.1.1 as min version to run pep8 jobs on py3.8. As of now, many projects are using old hacking version so I am explicitly adding pyflakes>=2.1.1 on the project side[2] but for the long term easy maintenance, I am doing it in 'hacking' requirements.txt[3] nd will release a new hacking version. After that project can move to new hacking and do not need to maintain pyflakes version compatibility. 4. Bug#1886298. (IN-PROGRESS) 'Markupsafe' 1.0 is not compatible with the latest version of setuptools[4], We need to bump the lower-constraint for Markupsafe to 1.1.1 to make it work. There are a few more issues[5] with lower-constraint jobs which I am debugging. What work to be done on the project side: ================================ This goal is more of testing the jobs on focal and fixing bugs if any otherwise migrate jobs by switching the nodeset to focal node sets defined in devstack. 1. Start a patch in your repo by making depends-on on either of below: devstack base patch if you are using only devstack base jobs not tempest: https://review.opendev.org/#/c/731207/ OR tempest base patch if you are using the tempest base job (like devstack-tempest): https://review.opendev.org/#/c/734700/ Example: https://review.opendev.org/#/c/738126/ 2. If none of your project jobs override the nodeset then above patch will be testing patch(do not merge) otherwise change the nodeset to focal. Example: https://review.opendev.org/#/c/737370/ 3. If the jobs are defined in branchless repo and override the nodeset then you need to override the branches variant to adjust the nodeset so that those jobs run on Focal on victoria onwards only. If no nodeset is overridden then devstack being branched and stable base job using bionic/xenial will take care of this. Once we finish the testing on projects side and no failure then we will merge the devstack and tempest base patches. Important things to note: =================== * Do not forgot to add the story and task link to your patch so that we can track it smoothly. * Use gerrit topic 'migrate-to-focal' * Do not backport any of the patches. References: ========= Goal doc: https://governance.openstack.org/tc/goals/selected/victoria/migrate-ci-cd-jobs-to-ubuntu-focal.html Storyboard tracking: https://storyboard.openstack.org/#!/story/2007865 [1] https://github.com/PyCQA/pyflakes/issues/367 [2] https://review.opendev.org/#/c/739315/ [3] https://review.opendev.org/#/c/739334/ [4] https://github.com/pallets/markupsafe/issues/116 [5] https://zuul.opendev.org/t/openstack/build/7ecd9cf100194bc99b3b70fa1e6de032 -gmann From hongbin034 at gmail.com Sun Jul 5 19:47:58 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sun, 5 Jul 2020 15:47:58 -0400 Subject: [Neutron] Bug Deputy Report (June 29 - July 05) Message-ID: Hi all, Below is the bug deputy report for last week. Critical: * https://bugs.launchpad.net/neutron/+bug/1885900 test_trunk_subport_lifecycle is failing in ovn based jobs * https://bugs.launchpad.net/neutron/+bug/1885899 test_qos_basic_and_update test is failing High: * https://bugs.launchpad.net/neutron/+bug/1886116 slaac no longer works on IPv6 tenant subnets * https://bugs.launchpad.net/neutron/+bug/1885898 test connectivity through 2 routers fails in neutron-ovn-tempest-full-multinode-ovs-master job * https://bugs.launchpad.net/neutron/+bug/1885897 Tempest test_create_router_set_gateway_with_fixed_ip test is failing often in dvr scenario job * https://bugs.launchpad.net/neutron/+bug/1885695 [OVS] "vsctl" implementation does not allow empty transactions Medium: * https://bugs.launchpad.net/neutron/+bug/1885891 DB exception when updating a "ml2_port_bindings" object * https://bugs.launchpad.net/neutron/+bug/1885758 RPCMessage timeouts when ovs agent is reporting status about many ports Low: * https://bugs.launchpad.net/neutron/+bug/1886216 keepalived-state-change does not format correctly the logs * https://bugs.launchpad.net/neutron/+bug/1885547 [fullstack] OVS interface events isolation error with more than one OVS agent RFE: * https://bugs.launchpad.net/neutron/+bug/1885921 [RFE][floatingip port_forwarding] Add port ranges -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Mon Jul 6 02:14:41 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Mon, 6 Jul 2020 02:14:41 +0000 Subject: =?utf-8?B?562U5aSNOiBbbGlzdHMub3BlbnN0YWNrLm9yZ+S7o+WPkV1bY3lib3JnXSBJ?= =?utf-8?Q?ncomplete_v2_API_in_Train?= In-Reply-To: <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> References: <68eefdd8dbf1a67a74233c0d02e5b6d8@sslemail.net> <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Message-ID: Ildik, Cyborg officially completed the V2 version switch from the Ussuri version [1][2], and introduced microversion, you can refer to [3] or more information about the latest Cyborg V2 API. Sorry, we did not backport Device & Deployable V2 API to Train version. [1]https://specs.openstack.org/openstack/cyborg-specs/specs/ussuri/approved/cyborg-api.html [2]https://review.opendev.org/#/c/695648/, https://review.opendev.org/#/c/712835/ [3]https://docs.openstack.org/api-ref/accelerator/v2/index.html -----邮件原件----- 发件人: Ildiko Vancsa [mailto:ildiko.vancsa at gmail.com] 发送时间: 2020年7月3日 22:53 收件人: OpenStack Discuss 主题: [lists.openstack.org代发][cyborg] Incomplete v2 API in Train Hi Cyborg Team, I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? Thanks, Ildikó [1] https://www.lfnetworking.org/about/cntt/ [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 From yumeng_bao at yahoo.com Mon Jul 6 03:09:08 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Mon, 6 Jul 2020 11:09:08 +0800 Subject: [cyborg] Incomplete v2 API in Train References: <41297731-AD04-4A0E-9E21-56DD3FF90885.ref@yahoo.com> Message-ID: <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com>  Hi Ildikó, > Hi Cyborg Team, > I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration > resources. > We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct > the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. Yes,your understanding is correct,the v2 API implementation in Train is partial. I would update the documentation soon. I would recommend you to use the stable/ussuri version(instead of train release) of cyborg for two reasons: 1) API V2 in ussuri is complete while that of train is incomplete 2)the nova-cyborg integration[3] was not landed until Ussuri[4],so the integration in Train is also partial[5]. So if CNTT wants the complete accelerator management function,it would be better to use cyborg ussuri. > The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be > possible to bring this up and discuss on an upcoming Cyborg team meeting? Yes, sure. We can bring this up on the next weekly meeting on this Thursday 03:00 UTC at #openstack-cyborg, I have added this to meeting agenda[6]. > Thanks, > Ildikó > [1] https://www.lfnetworking.org/about/cntt/ > [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 [3]https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/nova-cyborg-interaction.html [4]https://releases.openstack.org/ussuri/highlights.html#cyborg [5]https://releases.openstack.org/train/highlights.html#cyborg [6]https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda Regards, Yumeng From ildiko.vancsa at gmail.com Mon Jul 6 06:30:26 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 6 Jul 2020 08:30:26 +0200 Subject: [cyborg] Incomplete v2 API in Train In-Reply-To: <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com> References: <41297731-AD04-4A0E-9E21-56DD3FF90885.ref@yahoo.com> <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com> Message-ID: <99AD36C6-98D1-4E2F-A811-002EF14D6944@gmail.com> Hi Yumeng, Thank you for the information. I will also attend the meeting this Thursday to get a full understanding and plans to work together with CNTT on acceleration management. Thanks, Ildikó > On Jul 6, 2020, at 05:09, yumeng bao wrote: > > > Hi Ildikó, > >> Hi Cyborg Team, > >> I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration > resources. > >> We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct > the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. > > Yes,your understanding is correct,the v2 API implementation in Train is partial. I would update the documentation soon. > I would recommend you to use the stable/ussuri version(instead of train release) of cyborg for two reasons: 1) API V2 in ussuri is complete while that of train is incomplete 2)the nova-cyborg integration[3] was not landed until Ussuri[4],so the integration in Train is also partial[5]. So if CNTT wants the complete accelerator management function,it would be better to use cyborg ussuri. > >> The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be > possible to bring this up and discuss on an upcoming Cyborg team meeting? > > Yes, sure. We can bring this up on the next weekly meeting on this Thursday 03:00 UTC at #openstack-cyborg, I have added this to meeting agenda[6]. > >> Thanks, >> Ildikó > >> [1] https://www.lfnetworking.org/about/cntt/ >> [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 > > > > [3]https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/nova-cyborg-interaction.html > [4]https://releases.openstack.org/ussuri/highlights.html#cyborg > [5]https://releases.openstack.org/train/highlights.html#cyborg > [6]https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda > > > Regards, > Yumeng > From ildiko.vancsa at gmail.com Mon Jul 6 06:31:11 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 6 Jul 2020 08:31:11 +0200 Subject: =?utf-8?B?UmU6IFtsaXN0cy5vcGVuc3RhY2sub3Jn5Luj5Y+RXVtjeWJvcmdd?= =?utf-8?B?IEluY29tcGxldGUgdjIgQVBJIGluIFRyYWlu?= In-Reply-To: References: <68eefdd8dbf1a67a74233c0d02e5b6d8@sslemail.net> <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Message-ID: Hi Brin, Thank you for the information, I will read through the links you provided and get back if I have more questions. Thanks, Ildikó > On Jul 6, 2020, at 04:14, Brin Zhang(张百林) wrote: > > Ildik, > > Cyborg officially completed the V2 version switch from the Ussuri version [1][2], and introduced microversion, you can refer to [3] or more information about the latest Cyborg V2 API. Sorry, we did not backport Device & > Deployable V2 API to Train version. > > [1]https://specs.openstack.org/openstack/cyborg-specs/specs/ussuri/approved/cyborg-api.html > [2]https://review.opendev.org/#/c/695648/, https://review.opendev.org/#/c/712835/ > [3]https://docs.openstack.org/api-ref/accelerator/v2/index.html > > -----邮件原件----- > 发件人: Ildiko Vancsa [mailto:ildiko.vancsa at gmail.com] > 发送时间: 2020年7月3日 22:53 > 收件人: OpenStack Discuss > 主题: [lists.openstack.org代发][cyborg] Incomplete v2 API in Train > > Hi Cyborg Team, > > I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. > > We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. > > The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? > > Thanks, > Ildikó > > [1] https://www.lfnetworking.org/about/cntt/ > [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 > > > From katonalala at gmail.com Mon Jul 6 07:11:32 2020 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 6 Jul 2020 09:11:32 +0200 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: Hi, Exactly, it is not allowed to backport such a change to rocky for example, so on older branches the migration scripts will be there as I see, and you can upgrade to a release which support migration to Victoria for example. Regards lajoskatona Akihiro Motoki ezt írta (időpont: 2020. júl. 3., P, 15:39): > On Thu, Jul 2, 2020 at 10:37 PM Ruby Loo wrote: > > > > Hi, > > > > On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki > wrote: > >> > >> On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona > wrote: > >> > > >> > Hi, > >> > Simplification sounds good (I do not take into considerations like > "no code fanatic movements" or similar). > >> > How this could affect upgrade, I am sure there are deployments older > than pike, and those at a point will > >> > got for some newer version (I hope we can give them good answers for > their problems as Openstack) > >> > > >> > What do you think about stadium projects? As those have much less > activity (as mostly solve one rather specific problem), > >> > and much less migration scripts shall we just "merge" those to init > ops? > >> > I checked quickly a few stadium project and only bgpvpn has newer > migration scripts than pike. > >> > >> In my understanding, squashing migrations can be done repository by > repository. > >> A revision hash of each migration is not changed and head revisions > >> are stored in the database per repository, so it should work. > >> For initial deployments, neutron-db-manage runs all db migrations from > >> the initial revision to a specified revision (release), so it has no > >> problem. > >> For upgrade scenarios, this change just means that we just dropped > >> support upgrade from releases included in squashed migrations. > >> For example, if we squash migrations up to rocky (and create > >> rocky_initial migration) in the neutron repo, we no longer support db > >> migration from releases before rocky. This would be the only > >> difference I see. > > > > > > > > I wonder if this is acceptable (that an OpenStack service will not > support db migrations prior to rocky). What is (or is there?) OpenStack's > stance wrt support for upgrades? We are using ocata and plan on upgrading > but we don't know when that might happen :-( > > > > --ruby > > It is not true. What we the upstream community recommend is to upgrade > the controller node and databases in the fast-foward upgrade manner. > Even if the upstream repository just provides database migration from > for example Rocky, you can upgrade from a release older than rocky, by > upgrading one release by one. > In addition, by keeping a specific number of releases in db > migrations, operators can still upgrade from more than one old release > (if they want). > > --amotoki > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Mon Jul 6 07:46:24 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 6 Jul 2020 09:46:24 +0200 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Hi all, Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. Dmitry On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > Hi all, > > Since we're switching to 6 releases per year cadence, I think it makes > sense to have short virtual meetups after every release. The goal will be > to sync on priorities, exchange ideas and define plans for the upcoming 2 > months of development. Fooling around is also welcome! > > Please vote for the best 2 hours slot next week: > https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more > potential time zones, so apologies for so many options. Please cast your > vote until Friday, 12pm UTC, so that I can announce the final time slot > this week. > > Dmitry > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Mon Jul 6 08:46:13 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Mon, 6 Jul 2020 10:46:13 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi Alfredo, since you mentioned, it is not essential to have that opstool, so I have replaced it with "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so now it is: "default": { "oschecks_package": "sysstat" } And you are absolutely right regarding delorean, it took only OSP packages from that, and kvm and libvirt are at your specified versions. And then I believe, I found case for failing VM: "6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check" so, my question is: I have only control (pxe) network, which is distributed between sites and OSP is having only one network (ControlPlane). How my controller and compute network should look like? My controller network looks like [1] and compute like [2]. When I uncomment in compute br-provider part, it do not deploy. does br-provider networks MUST be interconnectable? I would need to have the possibility with the local network (vxlan) to communicate between instances within the cloud, and external connectivity would be done using provider vlan. each provider VLAN will be used only on one compute node. is it possible? [0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package list in libvirt container [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso wrote: > > > On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis > wrote: > >> it is, i have image build failing. i can modify yaml used to create >> image. can you remind me which files it would be? >> >> > Right, I see that the patch must not be working fine for centos and the > package is being installed from delorean repos in the log. I guess it > needs an entry to cover the centos 8 case (i'm checking with opstools > maintainer). > > As workaround I'd propose you to use the package from: > > > https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ > > or alternatively applying some local patch to tripleo-puppet-elements. > > >> and your question, "how it can impact kvm": >> >> in image most of the packages get deployed from deloren repos. I believe >> part is from centos repos and part of whole packages in >> overcloud-full.qcow2 are from deloren. so it might have bit different minor >> version, that might be incompactible... at least it have happend for me >> previously with train release so i used tested ci fully from the >> beginning... >> I might be for sure wrong. >> > > Delorean repos contain only OpenStack packages, things like nova, etc... > not kvm or things included in CentOS repos. KVM will always installed which > should be installed from "Advanced Virtualization" repository. May you > check what versions of qemu-kvm and libvirt you got installed into the > overcloud-full image?, it should match with the versions in: > > > http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ > > like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayadityagupta11 at gmail.com Mon Jul 6 08:51:41 2020 From: jayadityagupta11 at gmail.com (jayaditya gupta) Date: Mon, 6 Jul 2020 10:51:41 +0200 Subject: [python-openstackclient] microversion support Message-ID: Hi , we discussed the microversion support in PTG meeting .I would like to get started with it. How can I help with this? Currently OpenStack CLI is defaulting to nova api V2, we want to change it so it takes latest version. for issue reference see this : https://storyboard.openstack.org/#!/story/2007727 Best Regards Jayaditya Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Mon Jul 6 08:59:37 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Mon, 6 Jul 2020 10:59:37 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: I have created a network with geneve and it worked. Previous network which it used by default was vlan. First of all, thank you Arkady for LogTool ;) Second, how to modify my config, to have VLAN working? NeutronNetworkType: 'vlan,geneve' NeutronTunnelTypes: 'vxlan' NeutronBridgeMappings: 'default:br-provider' NeutronGlobalPhysnetMtu: 1500 NeutronBridgeMappings: datacentre:br-ex NeutronExternalNetworkBridge: 'br-ex' my compute network layout. [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config [3] http://paste.openstack.org/show/795564/ # ip a s from compute On Mon, 6 Jul 2020 at 10:46, Ruslanas Gžibovskis wrote: > Hi Alfredo, > > since you mentioned, it is not essential to have that opstool, so I have > replaced it with > "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so > now it is: > "default": { > "oschecks_package": "sysstat" > } > > And you are absolutely right regarding delorean, it took only OSP packages > from that, and kvm and libvirt are at your specified versions. > > And then I believe, I found case for failing VM: > > "6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: > nova.exception.PortBindingFailed: Binding failed for port > af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check" > > so, my question is: > I have only control (pxe) network, which is distributed between sites and > OSP is having only one network (ControlPlane). How my controller and > compute network should look like? > My controller network looks like [1] and compute like [2]. When I > uncomment in compute br-provider part, it do not deploy. > does br-provider networks MUST be interconnectable? > > I would need to have the possibility with the local network (vxlan) to > communicate between instances within the cloud, and external connectivity > would be done using provider vlan. each provider VLAN will be used only on > one compute node. is it possible? > > > [0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package > list in libvirt container > [1] http://paste.openstack.org/show/795562/ # controller net-config > [2] http://paste.openstack.org/show/795563/ @ compute net-config > > On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso > wrote: > >> >> >> On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis >> wrote: >> >>> it is, i have image build failing. i can modify yaml used to create >>> image. can you remind me which files it would be? >>> >>> >> Right, I see that the patch must not be working fine for centos and the >> package is being installed from delorean repos in the log. I guess it >> needs an entry to cover the centos 8 case (i'm checking with opstools >> maintainer). >> >> As workaround I'd propose you to use the package from: >> >> >> https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ >> >> or alternatively applying some local patch to tripleo-puppet-elements. >> >> >>> and your question, "how it can impact kvm": >>> >>> in image most of the packages get deployed from deloren repos. I believe >>> part is from centos repos and part of whole packages in >>> overcloud-full.qcow2 are from deloren. so it might have bit different minor >>> version, that might be incompactible... at least it have happend for me >>> previously with train release so i used tested ci fully from the >>> beginning... >>> I might be for sure wrong. >>> >> >> Delorean repos contain only OpenStack packages, things like nova, etc... >> not kvm or things included in CentOS repos. KVM will always installed which >> should be installed from "Advanced Virtualization" repository. May you >> check what versions of qemu-kvm and libvirt you got installed into the >> overcloud-full image?, it should match with the versions in: >> >> >> http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ >> >> like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm >> >> >>> >>> -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jul 6 09:13:02 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jul 2020 11:13:02 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> Hi, Bump. Anyone has got any thoughts about it? > On 23 Jun 2020, at 12:24, Slawek Kaplonski wrote: > > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Principal software engineer Red Hat From thierry at openstack.org Mon Jul 6 09:19:23 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 6 Jul 2020 11:19:23 +0200 Subject: [largescale-sig] Next meeting: July 8, 8utc Message-ID: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> Hi everyone, Hot on the heels of the OpenDev event on Large scale deployments, the Large Scale SIG will have a meeting this week on Wednesday, July 8 at 8 UTC[1] in the #openstack-meeting-3 channel on IRC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200708T08 Feel free to add topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting A reminder of the TODOs we had from last meeting, in case you have time to make progress on them: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - ttx to produce a draft of the 10th birthday slide for the SIG Talk to you all on Wednesday, -- Thierry Carrez From tobias.urdin at binero.com Mon Jul 6 09:49:01 2020 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 6 Jul 2020 09:49:01 +0000 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <1594028941528.18866@binero.com> Hello Slawek, This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and are looking forward into offloading a lot of work to OVN in the future. We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. >From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind the scenes in order to get to the desired point. * Geneve offloading on NIC, we might need new NICs or new firmware. * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. Best regards Tobias ________________________________________ From: Slawek Kaplonski Sent: Tuesday, June 23, 2020 12:24 PM To: OpenStack Discuss ML Cc: Assaf Muller; Daniel Alvarez Sanchez Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend Hi, The Neutron team wants to propose a switch of the default Neutron backend in Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to OVN with its own ovn-metadata-agent and ovn-controller. We discussed that change during the virtual PTG - see [1]. In this document we want to explain reasons why we want to do that change. OVN in 75 Words --------------- Open Virtual Network is managed under the OVS project, and was created by the original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, using lessons learned throughout the years. It is intended to be used in projects such as OpenStack and Kubernetes. OVN has a different architecture, moving us away from Python agents communicating with the Neutron API service via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. Here’s a heap of information about OpenStack’s integration of OVN: * OpenStack Boston Summit talk on OVN [2] * Upstream OpenStack networking-ovn documentation [3] and [4] * OSP 13 OVN documentation, including how to install it using Director [5] Neutron OVN driver was developed as a Neutron stadium project, "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main Neutron repository. Why? ---- In the Neutron team we believe that OVN and the Neutron OVN driver are built with a modern architecture that offers better foundations for a simpler and more performant solution. We see increased participation in kubernetes-ovn, resulting in a larger core OVN community, and we would like OpenStack to benefit from this Kubernetes driven OVN investment. Neutron OVN driver currently has got some feature parity gaps comparing to ML2/OVS (see [6] for details) but our team is working hard to close those gaps and we believe that this driver is the future for Neutron and that’s why we want to make it the default Neutron ML2 backend in the Devstack configuration. What Does it Mean? ------------------ Since most Openstack projects use Neutron in their CI and gate jobs, this change has the potential for a large impact. But this backend is already tested with various jobs in the Neutron CI and it works fine. Recently (See [7]) we also proposed to add an OVN based job to the Devstack’s check queue. Similarly the default Neutron backend in TripleO was changed in the Stein cycle and there were no any significant issues related strictly to this change. It worked well for other projects. Of course in the Neutron project we will be still gating other drivers, like ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of some of the jobs. The Neutron team is *NOT* going to deprecate any of the other existing ML2 drivers. We will be still maintaining Linuxbridge, OVS and other in-tree drivers in the same way as it is now. Action Plan ----------- We want to make this change before the Victoria-2 milestone to not make such changes too late in the release cycle. Our action plan is as below: 1. Share the plan and get feedback from the upstream community (this thread) 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to Devstack repo - we don’t want to force everyone else to add “enable_plugin neutron” in their local.conf file to use default Neutron backend, 3. Switch default Neutron backend in Devstack to be OVN, a. Switch definition of base devstack CI jobs that it will run Neutron with OVN backend, 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack projects to check if it will not break anything in the gate of those projects. 5. If all will be running fine, merge patches proposed in points 3 and 3a. [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 [2] https://www.youtube.com/watch?v=sgc7myiX6ts [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html [4] https://docs.openstack.org/neutron/latest/ovn/index.html [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html [7] https://review.opendev.org/#/c/736021/ -- Slawek Kaplonski Senior software engineer Red Hat From radoslaw.piliszek at gmail.com Mon Jul 6 10:10:50 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jul 2020 12:10:50 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> Message-ID: On Mon, Jul 6, 2020 at 11:15 AM Slawek Kaplonski wrote: > > Hi, > > Bump. Anyone has got any thoughts about it? +2, happy to stress OVN OpenStack-wise. :-) -yoctozepto From lyarwood at redhat.com Mon Jul 6 10:57:21 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 6 Jul 2020 11:57:21 +0100 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained Message-ID: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> Hello all, Following on from my recent mail about the stable/ocata branch of the openstack/nova project now being unmaintained [1] I'd also like to move the stable/pike [2] branch formally into this phase of maintenance [3]. Volunteers are welcome to step forward and attempt to move the branch back to the ``Extended Maintenance`` phase by proposing changes and fixing CI in the next 3 months, otherwise the branch will be marked as ``EOL`` [4]. Again hopefully this isn't taking anyone by surprise but please let me know if this is going to be an issue! Regards, [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From juliaashleykreger at gmail.com Mon Jul 6 13:12:57 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 6 Jul 2020 06:12:57 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings everyone! We'll use our meetpad[1]! -Julia [1]: https://meetpad.opendev.org/ironic On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > Hi all, > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > Dmitry > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: >> >> Hi all, >> >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! >> >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. >> >> Dmitry From akekane at redhat.com Mon Jul 6 13:39:24 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 6 Jul 2020 19:09:24 +0530 Subject: [glance] Global Request ID issues in Glance In-Reply-To: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> References: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> Message-ID: Hi Markus, Thank you for detailed analysis. Both cases you pointed out are valid bugs. Could you please report this to launchpad? Thanks & Best Regards, Abhishek Kekane On Fri, Jun 26, 2020 at 6:33 PM Markus Hentsch wrote: > Hello everyone, > > while I was experimenting with the Global Request ID functionality of > OpenStack [1], I identified two issues in Glance related to this topic. > I have written my findings below and would appreciate it if you could > take a look and confirm whether those are intended behaviors or indeed > issues with the implementation. > > In case of the latter please advice me which bug tracker to report them > to. > > > 1. The Glance client does not correctly forward the global ID > > When the SessionClient class is used, the global_request_id is removed > from kwargs in the constructor using pop() [2]. Directly after this, > the parent constructor is called using super(), which in this case is > Adapter from the keystoneauth1 library. Therein the global_request_id > is set again [3] but since it has been removed from the kwargs, it > defaults to None as specified in the Adapter's __init__() header. Thus, > the global_request_id passed to the SessionClient constructor never > actually makes it to the Glance API. This is in contrast to the > HTTPClient class, where get() is used instead of pop() [4]. > > This can be reproduced simply by creating a server in Nova from an > image in Glance, which will attempt to create the Glance client > instance using the global_request_id [5]. Passing the > "X-Openstack-Request-Id" header during the initial API call for the > server creation, makes it visible in Nova (using a suitable > "logging_context_format_string" setting) but it's not visible in > Glance. Using a Python debugger shows Glance generating a new local ID > instead. > > > 2. Glance interprets global ID as local one for Oslo Context objects > > While observing the Glance log file, I observed Glance always logging > the global_request_id instead of a local one if it is available. > > Using "%(global_request_id)s" within "logging_context_format_string"[6] > in the glance-api.conf will always print "None" in the logs whereas > "%(request_id)s" will either be an ID generated by Glance if no global > ID is available or the received global ID. > > Culprit seems to be the context middleware of Glance where the global > ID in form of the "X-Openstack-Request-Id" header is parsed from the > request and passed as "request_id" instead of "global_request_id" to > the "glance.context.RequestContext.from_environ()" call [7]. > > This is in contrast to other services such as Nova or Neutron where > the two variables actually print the values according to their name > (request_id always being the local one, whereas global_request_id is > the global one or None). > > > [1] > > https://specs.openstack.org/openstack/oslo-specs/specs/pike/global-req-id.html > [2] > > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L355 > [3] > > https://github.com/openstack/keystoneauth/blob/dab8e1057ae8bb9a0e778fb8d3141ad4fb36a339/keystoneauth1/adapter.py#L166 > [4] > > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L162 > [5] > > https://github.com/openstack/nova/blob/1cae0cd7229207478b70275509aecd778ca69225/nova/image/glance.py#L78 > [6] > > https://docs.openstack.org/oslo.context/2.17.0/user/usage.html#context-variables > [7] > > https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/middleware/context.py#L201 > > > Thanks! > > Markus > > -- > Markus Hentsch > Team Leader > > secustack GmbH - Digital Sovereignty in the Cloud > https://www.secustack.com > Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden > District Court Dresden, Register Number: HRB 38890 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionut at fleio.com Mon Jul 6 14:17:10 2020 From: ionut at fleio.com (Ionut Biru) Date: Mon, 6 Jul 2020 17:17:10 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi Rafael, I have an error and I cannot resolve it myself. https://paste.xinu.at/LEfdXD/ Do you happen to know what's wrong? endpoint list https://paste.xinu.at/v3j1jl/ octavia.yaml https://paste.xinu.at/TIxfOz/ polling.yaml https://paste.xinu.at/oBEFj/ pipeline.yaml https://paste.xinu.at/qvEdTX/ On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Good catch. I fixed the docs. > https://review.opendev.org/#/c/739288/ > > On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: > >> Hi, >> >> I just noticed that the example dynamic.network.services.vpn.connection >> from >> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >> the wrong indentation. >> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >> >> Now I have to see why is not polling from it >> >> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >> >>> Hi Rafael, >>> >>> I think I applied all the reviews successfully but I tried to do an >>> octavia dynamic poller but I have couples of errors. >>> >>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>> >>> if i remove the - in front of name like this: >>> https://paste.xinu.at/K7s5I8/ >>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>> >>> Is there something I missed or is something wrong in yaml? >>> >>> >>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> >>>> Since the merging window for ussuri was long passed for those commits, >>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>> those will be available for victoria? >>>>> >>>> >>>> I would say so. We are lacking people to review and then merge it. >>>> >>>> How safe is to cherry pick those commits and use them in production? >>>>> >>>> As long as the person executing the cherry-picks, and maintaining the >>>> code knows what she/he is doing, you should be safe. The guys that are >>>> using this implementation (and others that I and my colleagues proposed), >>>> have a few openstack components that are customized with the >>>> patches/enhancements/extensions we developed so far; this means, they are >>>> not using the community version, but something in-between (the community >>>> releases + the patches we did). Of course, it is only possible, because we >>>> are the ones creating and maintaining these codes; therefore, we can assure >>>> quality for production. >>>> >>>> >>>> >>>> >>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>> >>>>> Hello Rafael, >>>>> >>>>> Since the merging window for ussuri was long passed for those commits, >>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>> those will be available for victoria? >>>>> >>>>> How safe is to cherry pick those commits and use them in production? >>>>> >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>>> However, there are some important PRs still waiting for a merge, that might >>>>>> be important for your use case: >>>>>> * https://review.opendev.org/#/c/722092/ >>>>>> * https://review.opendev.org/#/c/715180/ >>>>>> * https://review.opendev.org/#/c/715289/ >>>>>> * https://review.opendev.org/#/c/679999/ >>>>>> * https://review.opendev.org/#/c/709807/ >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>> cgoncalves at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>> >>>>>>> >>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>> >>>>>>> >>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>> >>>>>>> >>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>> the Ceilometer project. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Ionut, >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello guys, >>>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>>> the following: >>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>> - network.services.lb.listener >>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>> - network.services.lb.member >>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>> - network.services.lb.pool >>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>> >>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>> supported in neutron. >>>>>>>>>> >>>>>>>>>> I found >>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>> but this is not available in stein or train. >>>>>>>>>> >>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>> octavia. >>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>> deployed and has status active. >>>>>>>>>> >>>>>>>>> >>>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>>>> >>>>>>>>> I hope this helps with your use case. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Carlos >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>> [2] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>> [3] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>> [4] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jul 6 14:57:27 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 6 Jul 2020 10:57:27 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Review - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - Update goal selection docs to clarify the goal count https://review.opendev.org/739150 - Add legacy repository validation https://review.opendev.org/737559 - Add "tc:approved-release" tag to manila https://review.opendev.org/738105 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 ## Project Updates - Add deprecated cycle for deprecated deliverables https://review.opendev.org/737590 - No longer track refstack repos in governance https://review.opendev.org/737962 - Add Neutron Arista plugin charm to OpenStack charms https://review.opendev.org/737734 ## General Changes - Add links to chosen release names https://review.opendev.org/738867 - Add storyboard link to migrate-to-focal goal https://review.opendev.org/738129 - TC Guide Follow Ups https://review.opendev.org/737650 # Email Threads - New Office Hours: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015761.html - OSU Intern Work: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015760.html - Summit Programming Committee Nominations Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015756.html - Summit CFP Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015730.html # Other Reminders - OpenStack's 10th anniversary community meeting should be happening July 16th: more info coming soon! - If you're an operator, make sure you fill out our user survey: https://www.openstack.org/user-survey/survey-2020/ Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From elod.illes at est.tech Mon Jul 6 16:02:17 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Mon, 6 Jul 2020 18:02:17 +0200 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained In-Reply-To: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> References: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> Message-ID: <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> Just a heads-up that a devstack patch [1] addresses the issues in Pike. As soon as that is merging, stable/pike hopefully will be ready to accept fixes. I'll try to keep Pike working, but of course, anyone who is interested to help are welcome. :) [1] https://review.opendev.org/#/c/735616/ Thanks, Előd On 2020. 07. 06. 12:57, Lee Yarwood wrote: > Hello all, > > Following on from my recent mail about the stable/ocata branch of the > openstack/nova project now being unmaintained [1] I'd also like to move > the stable/pike [2] branch formally into this phase of maintenance [3]. > > Volunteers are welcome to step forward and attempt to move the branch > back to the ``Extended Maintenance`` phase by proposing changes and > fixing CI in the next 3 months, otherwise the branch will be marked as > ``EOL`` [4]. > > Again hopefully this isn't taking anyone by surprise but please let me > know if this is going to be an issue! > > Regards, > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike > [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained > [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life > From juliaashleykreger at gmail.com Mon Jul 6 16:15:16 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 6 Jul 2020 09:15:16 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings fellow humans! We had a great two hour session but we ran out of time to get back to the discussion of a capability/driver support matrix. We agreed we should have a call later in the week to dive back into the topic. I've created a doodle[1] for us to identify the best time for a hopefully quick 30 minute call to try and reach consensus. Thanks everyone! -Julia [1]: https://doodle.com/poll/kte79im2tz4ape9v On Mon, Jul 6, 2020 at 6:12 AM Julia Kreger wrote: > > Greetings everyone! > > We'll use our meetpad[1]! > > -Julia > > [1]: https://meetpad.opendev.org/ironic > > On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > > > Hi all, > > > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > > > Dmitry > > > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > >> > >> Hi all, > >> > >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! > >> > >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. > >> > >> Dmitry From rafaelweingartner at gmail.com Mon Jul 6 17:11:47 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 6 Jul 2020 14:11:47 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: It looks like a coding error that we left behind during a major refactoring that we introduced upstream. I created a patch for it. Can you check/review and test it? https://review.opendev.org/739555 On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: > Hi Rafael, > > I have an error and I cannot resolve it myself. > > https://paste.xinu.at/LEfdXD/ > > Do you happen to know what's wrong? > > endpoint list https://paste.xinu.at/v3j1jl/ > octavia.yaml https://paste.xinu.at/TIxfOz/ > polling.yaml https://paste.xinu.at/oBEFj/ > pipeline.yaml https://paste.xinu.at/qvEdTX/ > > > On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> Good catch. I fixed the docs. >> https://review.opendev.org/#/c/739288/ >> >> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >> >>> Hi, >>> >>> I just noticed that the example dynamic.network.services.vpn.connection >>> from >>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>> the wrong indentation. >>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>> >>> Now I have to see why is not polling from it >>> >>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>> >>>> Hi Rafael, >>>> >>>> I think I applied all the reviews successfully but I tried to do an >>>> octavia dynamic poller but I have couples of errors. >>>> >>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>> >>>> if i remove the - in front of name like this: >>>> https://paste.xinu.at/K7s5I8/ >>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>> >>>> Is there something I missed or is something wrong in yaml? >>>> >>>> >>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> >>>>> Since the merging window for ussuri was long passed for those commits, >>>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>>> those will be available for victoria? >>>>>> >>>>> >>>>> I would say so. We are lacking people to review and then merge it. >>>>> >>>>> How safe is to cherry pick those commits and use them in production? >>>>>> >>>>> As long as the person executing the cherry-picks, and maintaining the >>>>> code knows what she/he is doing, you should be safe. The guys that are >>>>> using this implementation (and others that I and my colleagues proposed), >>>>> have a few openstack components that are customized with the >>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>> not using the community version, but something in-between (the community >>>>> releases + the patches we did). Of course, it is only possible, because we >>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>> quality for production. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>> >>>>>> Hello Rafael, >>>>>> >>>>>> Since the merging window for ussuri was long passed for those >>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>> and those will be available for victoria? >>>>>> >>>>>> How safe is to cherry pick those commits and use them in production? >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>>>> However, there are some important PRs still waiting for a merge, that might >>>>>>> be important for your use case: >>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>> cgoncalves at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>> >>>>>>>> >>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>> >>>>>>>> >>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>> >>>>>>>> >>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>> the Ceilometer project. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Ionut, >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello guys, >>>>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>>>> the following: >>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>> - network.services.lb.member >>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>> >>>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>>> supported in neutron. >>>>>>>>>>> >>>>>>>>>>> I found >>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>> >>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>> octavia. >>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>> deployed and has status active. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>>>> the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>> >>>>>>>>>> I hope this helps with your use case. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Carlos >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>> [2] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>> [3] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>> [4] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Mon Jul 6 18:37:07 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 14:37:07 -0400 Subject: removing use of pkg_resources to improve command line app performance Message-ID: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps). Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”. A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways. First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name. Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages. Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly. I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. Doug [0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance From smooney at redhat.com Mon Jul 6 18:54:05 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 06 Jul 2020 19:54:05 +0100 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> Message-ID: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command > line apps). > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > produces data in a format that can be cached to make it even faster. I have started adding support for that caching to > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the > same library is available on PyPI as “importlib_metadata”. based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than > stevedore. We have a couple of different use patterns to consider and replace in different ways. > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to > choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for > all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager > directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) > of the available plugins by name. > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s > installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster > because importlib goes directly to the metadata file for the named package instead of looking through all of the > installed packages. > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need > to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. > The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in > stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the > manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely > to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the > work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > Doug > > [0] https://review.opendev.org/#/c/739306/ > [1] https://docs.openstack.org/stevedore/latest/ > [2] https://review.opendev.org/#/c/739379/2 > [3] https://review.opendev.org/#/q/topic:osc-performance > From fungi at yuggoth.org Mon Jul 6 19:02:46 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 6 Jul 2020 19:02:46 +0000 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: <20200706190246.c4u7thhjixgavbjj@yuggoth.org> On 2020-07-06 19:54:05 +0100 (+0100), Sean Mooney wrote: > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: [...] > > Python 3.8 added a new library importlib.metadata, which also > > has an entry points API. It is more efficient, and produces data > > in a format that can be cached to make it even faster. I have > > started adding support for that caching to stevedore [0], which > > is the Oslo library for managing application plugins. For > > version of python earlier than 3.8, the same library is > > available on PyPI as “importlib_metadata”. > > based on > https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst > we still need to support 3.6 for victoria. is there a backport lib > like mock for this on older python releases? [...] According to https://pypi.org/project/importlib-metadata/ the current version (1.7.0) supports Python 3.5 and later. Won't that work? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From doug at doughellmann.com Mon Jul 6 19:03:00 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:03:00 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: > On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: >> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the >> startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of >> importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a >> command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command >> line apps). >> >> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and >> produces data in a format that can be cached to make it even faster. I have started adding support for that caching to >> stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the >> same library is available on PyPI as “importlib_metadata”. > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden. >> >> A big part of the implementation work will actually be removing the use of pkg_resources in places other than >> stevedore. We have a couple of different use patterns to consider and replace in different ways. >> >> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to >> choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for >> all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager >> directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) >> of the available plugins by name. >> >> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s >> installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster >> because importlib goes directly to the metadata file for the named package instead of looking through all of the >> installed packages. >> >> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need >> to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. >> The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in >> stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the >> manager abstractions in stevedore instead of manipulating EntryPoint instances directly. >> >> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely >> to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the >> work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. >> >> Doug >> >> [0] https://review.opendev.org/#/c/739306/ >> [1] https://docs.openstack.org/stevedore/latest/ >> [2] https://review.opendev.org/#/c/739379/2 >> [3] https://review.opendev.org/#/q/topic:osc-performance From radoslaw.piliszek at gmail.com Mon Jul 6 19:06:05 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jul 2020 21:06:05 +0200 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: On Mon, Jul 6, 2020 at 9:00 PM Sean Mooney wrote: > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a > > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command > > line apps). > > > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > > produces data in a format that can be cached to make it even faster. I have started adding support for that caching to > > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the > > same library is available on PyPI as “importlib_metadata”. > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? Is [1] that Doug mentioned not what you mean? It seems to support 3.5+ As a general remark, I've already seen the WIP. Very excited to see this performance bottleneck eliminated. [1] https://pypi.org/project/importlib-metadata/ -yoctozepto From doug at doughellmann.com Mon Jul 6 19:21:06 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:21:06 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> Message-ID: <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> > On Jul 6, 2020, at 2:37 PM, Doug Hellmann wrote: > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps). > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”. > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways. > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name. > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages. > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > Doug > > [0] https://review.opendev.org/#/c/739306/ > [1] https://docs.openstack.org/stevedore/latest/ > [2] https://review.opendev.org/#/c/739379/2 > [3] https://review.opendev.org/#/q/topic:osc-performance I neglected to mention that there are uses of pkg_resources outside of OpenStack code in libraries used by python-openstackclient. I found a use in dogpile and another in cmd2. I haven’t started working on patches to those, yet. If someone wants to do a more extensive search that would be very helpful. I started an etherpad to keep track of the work that’s in progress: https://etherpad.opendev.org/p/osc-performance From fungi at yuggoth.org Mon Jul 6 19:29:24 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 6 Jul 2020 19:29:24 +0000 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> Message-ID: <20200706192924.xl76yl5k4ct47gh3@yuggoth.org> On 2020-07-06 15:21:06 -0400 (-0400), Doug Hellmann wrote: [...] > I neglected to mention that there are uses of pkg_resources > outside of OpenStack code in libraries used by > python-openstackclient. I found a use in dogpile and another in > cmd2. I haven’t started working on patches to those, yet. If > someone wants to do a more extensive search that would be very > helpful. I started an etherpad to keep track of the work that’s in > progress: https://etherpad.opendev.org/p/osc-performance Looking at some other uses of pkg_resources, seems like this would be the new way to get the abbreviated Git commit ID stored by PBR: json.loads( importlib.metadata.distribution(packagename).read_text('pbr.json') )['git_version'] -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Mon Jul 6 19:30:45 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 06 Jul 2020 20:30:45 +0100 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote: > > On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: > > > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > > > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > > > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by > > > a > > > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived > > > command > > > line apps). > > > > > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > > > produces data in a format that can be cached to make it even faster. I have started adding support for that > > > caching to > > > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, > > > the > > > same library is available on PyPI as “importlib_metadata”. > > > > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? > > Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and > if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which > version of the library is invoked will be hidden. cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. so it sound like this should just work for os-vif at least. > > > > > > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than > > > stevedore. We have a couple of different use patterns to consider and replace in different ways. > > > > > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of > > > them to > > > choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation > > > for > > > all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a > > > stevedore.ExtensionManager > > > directly, but the other managers are meant to implement common access patterns like selecting a subset (or just > > > one) > > > of the available plugins by name. > > > > > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s > > > installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* > > > faster > > > because importlib goes directly to the metadata file for the named package instead of looking through all of the > > > installed packages. > > > > > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may > > > need > > > to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from > > > pkg_resources. > > > The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that > > > in > > > stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use > > > the > > > manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > > > > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s > > > likely > > > to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the > > > work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > > > > > Doug > > > > > > [0] https://review.opendev.org/#/c/739306/ > > > [1] https://docs.openstack.org/stevedore/latest/ > > > [2] https://review.opendev.org/#/c/739379/2 > > > [3] https://review.opendev.org/#/q/topic:osc-performance > > From doug at doughellmann.com Mon Jul 6 19:33:05 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:33:05 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: <6BBE8880-0CCC-40A1-8BAB-C9A992B310B5@doughellmann.com> > On Jul 6, 2020, at 3:30 PM, Sean Mooney wrote: > > On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote: >>> On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: >>> >>> On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: >>>> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the >>>> startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of >>>> importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by >>>> a >>>> command line application (long-running services are candidates, too, but the benefit is bigger in short-lived >>>> command >>>> line apps). >>>> >>>> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and >>>> produces data in a format that can be cached to make it even faster. I have started adding support for that >>>> caching to >>>> stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, >>>> the >>>> same library is available on PyPI as “importlib_metadata”. >>> >>> based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need >>> to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? >> >> Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and >> if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which >> version of the library is invoked will be hidden. > cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager > https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 > maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. > so it sound like this should just work for os-vif at least. That’s definitely the goal of putting the cache behind the stevedore API. >> >>>> >>>> A big part of the implementation work will actually be removing the use of pkg_resources in places other than >>>> stevedore. We have a couple of different use patterns to consider and replace in different ways. >>>> >>>> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of >>>> them to >>>> choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation >>>> for >>>> all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a >>>> stevedore.ExtensionManager >>>> directly, but the other managers are meant to implement common access patterns like selecting a subset (or just >>>> one) >>>> of the available plugins by name. >>>> >>>> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s >>>> installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* >>>> faster >>>> because importlib goes directly to the metadata file for the named package instead of looking through all of the >>>> installed packages. >>>> >>>> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may >>>> need >>>> to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from >>>> pkg_resources. >>>> The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that >>>> in >>>> stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use >>>> the >>>> manager abstractions in stevedore instead of manipulating EntryPoint instances directly. >>>> >>>> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s >>>> likely >>>> to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the >>>> work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. >>>> >>>> Doug >>>> >>>> [0] https://review.opendev.org/#/c/739306/ >>>> [1] https://docs.openstack.org/stevedore/latest/ >>>> [2] https://review.opendev.org/#/c/739379/2 >>>> [3] https://review.opendev.org/#/q/topic:osc-performance From lyarwood at redhat.com Mon Jul 6 21:50:04 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 6 Jul 2020 22:50:04 +0100 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained In-Reply-To: <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> References: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> Message-ID: <20200706215004.e74qvc45ypa3umd3@lyarwood.usersys.redhat.com> On 06-07-20 18:02:17, Előd Illés wrote: > Just a heads-up that a devstack patch [1] addresses the issues in Pike. As > soon as that is merging, stable/pike hopefully will be ready to accept > fixes. I'll try to keep Pike working, but of course, anyone who is > interested to help are welcome. :) > > [1] https://review.opendev.org/#/c/735616/ Excellent thanks Előd! Assuming that change lands in devstack and we start landing changes in openstack/nova again then the branch will return to the Extended Maintenance phase. > On 2020. 07. 06. 12:57, Lee Yarwood wrote: > > Hello all, > > > > Following on from my recent mail about the stable/ocata branch of the > > openstack/nova project now being unmaintained [1] I'd also like to move > > the stable/pike [2] branch formally into this phase of maintenance [3]. > > > > Volunteers are welcome to step forward and attempt to move the branch > > back to the ``Extended Maintenance`` phase by proposing changes and > > fixing CI in the next 3 months, otherwise the branch will be marked as > > ``EOL`` [4]. > > > > Again hopefully this isn't taking anyone by surprise but please let me > > know if this is going to be an issue! > > > > Regards, > > > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > > [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike > > [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained > > [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From zhengyupann at 163.com Tue Jul 7 07:39:03 2020 From: zhengyupann at 163.com (Zhengyu Pan) Date: Tue, 7 Jul 2020 15:39:03 +0800 (CST) Subject: [neutron][lbaas][octavia] How to implement health check using 100.64.0.0/10 network segments in loadbalancer? Message-ID: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> There are some private cloud or public cloud introduction: They use 100.64.0.0/14 network segments to check vm's health status in load balancer. In Region supporting VPC, load balancing private network IP and health check IP will be switched to 100 network segment. I can't understand how to implement it. How to do it? -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionut at fleio.com Tue Jul 7 07:52:46 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 7 Jul 2020 10:52:46 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Seems to work fine now. Thanks. On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > It looks like a coding error that we left behind during a major > refactoring that we introduced upstream. > I created a patch for it. Can you check/review and test it? > https://review.opendev.org/739555 > > On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: > >> Hi Rafael, >> >> I have an error and I cannot resolve it myself. >> >> https://paste.xinu.at/LEfdXD/ >> >> Do you happen to know what's wrong? >> >> endpoint list https://paste.xinu.at/v3j1jl/ >> octavia.yaml https://paste.xinu.at/TIxfOz/ >> polling.yaml https://paste.xinu.at/oBEFj/ >> pipeline.yaml https://paste.xinu.at/qvEdTX/ >> >> >> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> Good catch. I fixed the docs. >>> https://review.opendev.org/#/c/739288/ >>> >>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>> >>>> Hi, >>>> >>>> I just noticed that the example dynamic.network.services.vpn.connection >>>> from >>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>> the wrong indentation. >>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>> >>>> Now I have to see why is not polling from it >>>> >>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>> >>>>> Hi Rafael, >>>>> >>>>> I think I applied all the reviews successfully but I tried to do an >>>>> octavia dynamic poller but I have couples of errors. >>>>> >>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>> >>>>> if i remove the - in front of name like this: >>>>> https://paste.xinu.at/K7s5I8/ >>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>> >>>>> Is there something I missed or is something wrong in yaml? >>>>> >>>>> >>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> >>>>>> Since the merging window for ussuri was long passed for those >>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>> and those will be available for victoria? >>>>>>> >>>>>> >>>>>> I would say so. We are lacking people to review and then merge it. >>>>>> >>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>> >>>>>> As long as the person executing the cherry-picks, and maintaining the >>>>>> code knows what she/he is doing, you should be safe. The guys that are >>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>> have a few openstack components that are customized with the >>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>> not using the community version, but something in-between (the community >>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>> quality for production. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>> >>>>>>> Hello Rafael, >>>>>>> >>>>>>> Since the merging window for ussuri was long passed for those >>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>> and those will be available for victoria? >>>>>>> >>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>> that might be important for your use case: >>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>> >>>>>>>>> >>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>>> the Ceilometer project. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ionut, >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello guys, >>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>> >>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>>>> supported in neutron. >>>>>>>>>>>> >>>>>>>>>>>> I found >>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>> >>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>> octavia. >>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>> >>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Carlos >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>> [2] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>> [3] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>> [4] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Tue Jul 7 07:54:13 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Tue, 7 Jul 2020 00:54:13 -0700 Subject: OpenStack cluster event notification Message-ID: Hi All, So far, based on my understanding of OpenStack Python SDK, I am able to read the Hypervisor, Servers instances, however, I do not see an API to receive and handle the change notification/events for the operations that happens on the cluster e.g. A new VM is added, an existing VM is deleted etc. I see a documentation, which talks about emitting notifications over a message bus that indicate different events that occur within the service. Notifications in OpenStack https://docs.openstack.org/ironic/latest/admin/notifications.html 1. Does Openstack Python SDK support notification APIs? 2. How do I receive/monitor notifications for VM related changes? 3. How do I receive/monitor notifications for compute/hypervisor related changes? 4. How do I receive/monitor notifications for Virtual Switch related changes? Thanks in advance for any help in this regard. /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.schaefer at cloudandheat.com Tue Jul 7 08:43:47 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Tue, 07 Jul 2020 10:43:47 +0200 Subject: Neutron bandwidth metering based on remote address Message-ID: <2890841.xduM2AgYMW@antares> Dear list, We are trying to implement tenant bandwidth metering at the neutron router level. Since some of the network spaces connected to the external interface of the neutron router are supposed to be unmetered, we need to match on the remote address. Conveniently, there exists a --remote-ip-prefix option on meter label create; however, since [1], its meaning was changed to the exact opposite: Instead of matching on the *remote* prefix (towards the external interface), it matches on the *local* prefix (towards the OS tenant network). In an ideal world, we would want to revert that change and instead introduce a --local-ip-prefix option which covers that use-case. I suppose this is not a thing we *should* do though, given that this change made it into a few releases already. Instead, we’ll have to create a new option (which whatever name) + associated database schema + iptables rule patterns to implement the feature. The questions associated with this are now: - Does this make absolutely no sense to anyone? - What is the process for this? I suppose since this change was made intentionally and passed review, our desired change needs to go through a feature request process (blueprints maybe?). kind regards, Jonas Schäfer [1]: https://opendev.org/openstack/neutron/commit/ 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From ionut at fleio.com Tue Jul 7 10:49:05 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 7 Jul 2020 13:49:05 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hello again, What's the proper way to handle dynamic pollsters in gnocchi ? Right now ceilometer returns: WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia is not handled by Gnocchi I found https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html but I'm not sure if is the right direction. On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: > Seems to work fine now. Thanks. > > On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> It looks like a coding error that we left behind during a major >> refactoring that we introduced upstream. >> I created a patch for it. Can you check/review and test it? >> https://review.opendev.org/739555 >> >> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >> >>> Hi Rafael, >>> >>> I have an error and I cannot resolve it myself. >>> >>> https://paste.xinu.at/LEfdXD/ >>> >>> Do you happen to know what's wrong? >>> >>> endpoint list https://paste.xinu.at/v3j1jl/ >>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>> polling.yaml https://paste.xinu.at/oBEFj/ >>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>> >>> >>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> Good catch. I fixed the docs. >>>> https://review.opendev.org/#/c/739288/ >>>> >>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>> >>>>> Hi, >>>>> >>>>> I just noticed that the example >>>>> dynamic.network.services.vpn.connection from >>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>> the wrong indentation. >>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>> >>>>> Now I have to see why is not polling from it >>>>> >>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>> >>>>>> Hi Rafael, >>>>>> >>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>> octavia dynamic poller but I have couples of errors. >>>>>> >>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>>> >>>>>> if i remove the - in front of name like this: >>>>>> https://paste.xinu.at/K7s5I8/ >>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>> >>>>>> Is there something I missed or is something wrong in yaml? >>>>>> >>>>>> >>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>> and those will be available for victoria? >>>>>>>> >>>>>>> >>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>> >>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>> >>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>> have a few openstack components that are customized with the >>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>> not using the community version, but something in-between (the community >>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>> quality for production. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello Rafael, >>>>>>>> >>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>> and those will be available for victoria? >>>>>>>> >>>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>> >>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>> that might be important for your use case: >>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>>>> the Ceilometer project. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>> >>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>> >>>>>>>>>>>>> I found >>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>> >>>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>>> octavia. >>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>>> >>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Carlos >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>> [2] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>> [3] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>> [4] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Rafael Weingärtner >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Jul 7 11:43:06 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 7 Jul 2020 08:43:06 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: That is the right direction. I don't know why people hard-coded the initial pollsters' configs and did not document the relation between Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a single system, but interdependent systems to implement a monitoring solution. Ceilometer is the component that gathers data/information, processes, and then persists it somewhere. Gnocchi is one of the options that Ceilometer can use to persist data. By default, Ceilometer creates some basic configurations in Gnocchi to store data, such as some default resource-types with default attributes. However, we do not need (should not) rely on this default config. You can create and use custom resources to fit the stack to your needs. This can be achieved via `gnocchi resource-type create -a :: ` and `gnocchi resource-type create -u :: `. Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), you can customize the mapping of metrics to resource-types in Gnocchi. On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: > Hello again, > > What's the proper way to handle dynamic pollsters in gnocchi ? > Right now ceilometer returns: > > WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia is > not handled by Gnocchi > > I found > https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html > but I'm not sure if is the right direction. > > On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: > >> Seems to work fine now. Thanks. >> >> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> It looks like a coding error that we left behind during a major >>> refactoring that we introduced upstream. >>> I created a patch for it. Can you check/review and test it? >>> https://review.opendev.org/739555 >>> >>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>> >>>> Hi Rafael, >>>> >>>> I have an error and I cannot resolve it myself. >>>> >>>> https://paste.xinu.at/LEfdXD/ >>>> >>>> Do you happen to know what's wrong? >>>> >>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>> >>>> >>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> Good catch. I fixed the docs. >>>>> https://review.opendev.org/#/c/739288/ >>>>> >>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I just noticed that the example >>>>>> dynamic.network.services.vpn.connection from >>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>> the wrong indentation. >>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>> >>>>>> Now I have to see why is not polling from it >>>>>> >>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>> >>>>>>> Hi Rafael, >>>>>>> >>>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>>> octavia dynamic poller but I have couples of errors. >>>>>>> >>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>>>> >>>>>>> if i remove the - in front of name like this: >>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>> >>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>> and those will be available for victoria? >>>>>>>>> >>>>>>>> >>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>> >>>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>>> >>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>> have a few openstack components that are customized with the >>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>> not using the community version, but something in-between (the community >>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>> quality for production. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hello Rafael, >>>>>>>>> >>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>> and those will be available for victoria? >>>>>>>>> >>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>> production? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>> that might be important for your use case: >>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>> >>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found >>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>>>> octavia. >>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>>>> >>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Carlos >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>> [2] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>> [3] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>> [4] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Rafael Weingärtner >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Jul 7 12:09:29 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 7 Jul 2020 09:09:29 -0300 Subject: Neutron bandwidth metering based on remote address In-Reply-To: <2890841.xduM2AgYMW@antares> References: <2890841.xduM2AgYMW@antares> Message-ID: Hallo Jonas, I have worked to address this specific use case. First, the part of the solution that is already implemented. If you only need to gather metrics in a tenant fashion, you can take a look into this PR: https://review.opendev.org/#/c/735605/. That pull request enables operators to configure shared traffic labels, and then, these traffic labels will be exposed/published with different granularities. The different granularities are router, tenant, label, router-label, and tenant-label. The complete explanation can be found in the "RST" document that the PR also introduces, where we wrote a complete description of neutron metering, its configs, and usage. You are welcome to review and help us get this PR merged :) So far, if all you need is to measure the whole traffic, but in different granularities, that PR will probably be enough. On the other hand, if you need to create more complex rules to filter by source/destination IPs, then we need something else. Interestingly enough, we are working towards that. We will extend neutron API, and neutron metering to allow operators to use "remote-ip" and "source-ip" to create metering labels rules. We also saw the PR that changed the behavior of the "remote-ip" property, and the whole confusion it caused (at least for us). However, instead of proposing to revert it, we are working towards enabling the API to handle "remote-ip" and "source-ip", which will cover the use case of the person that introduced that commit, and many others such as ours and yours (probably). On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < jonas.schaefer at cloudandheat.com> wrote: > Dear list, > > We are trying to implement tenant bandwidth metering at the neutron router > level. Since some of the network spaces connected to the external > interface of > the neutron router are supposed to be unmetered, we need to match on the > remote address. > > Conveniently, there exists a --remote-ip-prefix option on meter label > create; > however, since [1], its meaning was changed to the exact opposite: Instead > of > matching on the *remote* prefix (towards the external interface), it > matches > on the *local* prefix (towards the OS tenant network). > > In an ideal world, we would want to revert that change and instead > introduce a > --local-ip-prefix option which covers that use-case. I suppose this is not > a > thing we *should* do though, given that this change made it into a few > releases already. > > Instead, we’ll have to create a new option (which whatever name) + > associated > database schema + iptables rule patterns to implement the feature. > > The questions associated with this are now: > > - Does this make absolutely no sense to anyone? > - What is the process for this? I suppose since this change was made > intentionally and passed review, our desired change needs to go through a > feature request process (blueprints maybe?). > > kind regards, > Jonas Schäfer > > [1]: https://opendev.org/openstack/neutron/commit/ > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.hentsch at secustack.com Tue Jul 7 12:49:11 2020 From: markus.hentsch at secustack.com (Markus Hentsch) Date: Tue, 7 Jul 2020 14:49:11 +0200 Subject: [glance] Global Request ID issues in Glance In-Reply-To: References: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> Message-ID: Hi Abhishek, thanks for having a look! I've filed corresponding bug reports: Glance client: https://bugs.launchpad.net/python-glanceclient/+bug/1886650 Glance API: https://bugs.launchpad.net/glance/+bug/1886657 Best regards, Markus Abhishek Kekane wrote: > Hi Markus, > > Thank you for detailed analysis. > Both cases you pointed out are valid bugs. Could you please report > this to launchpad? > > Thanks & Best Regards, > > Abhishek Kekane > > > On Fri, Jun 26, 2020 at 6:33 PM Markus Hentsch > > > wrote: > > Hello everyone, > > while I was experimenting with the Global Request ID functionality of > OpenStack [1], I identified two issues in Glance related to this > topic. > I have written my findings below and would appreciate it if you could > take a look and confirm whether those are intended behaviors or indeed > issues with the implementation. > > In case of the latter please advice me which bug tracker to report > them > to. > > > 1. The Glance client does not correctly forward the global ID > > When the SessionClient class is used, the global_request_id is removed > from kwargs in the constructor using pop() [2]. Directly after this, > the parent constructor is called using super(), which in this case is > Adapter from the keystoneauth1 library. Therein the global_request_id > is set again [3] but since it has been removed from the kwargs, it > defaults to None as specified in the Adapter's __init__() header. > Thus, > the global_request_id passed to the SessionClient constructor never > actually makes it to the Glance API. This is in contrast to the > HTTPClient class, where get() is used instead of pop() [4]. > > This can be reproduced simply by creating a server in Nova from an > image in Glance, which will attempt to create the Glance client > instance using the global_request_id [5]. Passing the > "X-Openstack-Request-Id" header during the initial API call for the > server creation, makes it visible in Nova (using a suitable > "logging_context_format_string" setting) but it's not visible in > Glance. Using a Python debugger shows Glance generating a new local ID > instead. > > > 2. Glance interprets global ID as local one for Oslo Context objects > > While observing the Glance log file, I observed Glance always logging > the global_request_id instead of a local one if it is available. > > Using "%(global_request_id)s" within > "logging_context_format_string"[6] > in the glance-api.conf will always print "None" in the logs whereas > "%(request_id)s" will either be an ID generated by Glance if no global > ID is available or the received global ID. > > Culprit seems to be the context middleware of Glance where the global > ID in form of the "X-Openstack-Request-Id" header is parsed from the > request and passed as "request_id" instead of "global_request_id" to > the "glance.context.RequestContext.from_environ()" call [7]. > > This is in contrast to other services such as Nova or Neutron where > the two variables actually print the values according to their name > (request_id always being the local one, whereas global_request_id is > the global one or None). > > > [1] > https://specs.openstack.org/openstack/oslo-specs/specs/pike/global-req-id.html > [2] > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L355 > [3] > https://github.com/openstack/keystoneauth/blob/dab8e1057ae8bb9a0e778fb8d3141ad4fb36a339/keystoneauth1/adapter.py#L166 > [4] > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L162 > [5] > https://github.com/openstack/nova/blob/1cae0cd7229207478b70275509aecd778ca69225/nova/image/glance.py#L78 > [6] > https://docs.openstack.org/oslo.context/2.17.0/user/usage.html#context-variables > [7] > https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/middleware/context.py#L201 > > > Thanks! > > Markus > > -- > Markus Hentsch > Team Leader > > secustack GmbH - Digital Sovereignty in the Cloud > https://www.secustack.com > Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden > District Court Dresden, Register Number: HRB 38890 > > -- Markus Hentsch Team Leader secustack GmbH - Digital Sovereignty in the Cloud https://www.secustack.com Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden District Court Dresden, Register Number: HRB 38890 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Tue Jul 7 14:09:09 2020 From: stephenfin at redhat.com (Stephen Finucane) Date: Tue, 07 Jul 2020 15:09:09 +0100 Subject: [nova] Changes for out-of-tree drivers Message-ID: <4ceab688ecde83dc4da8dda567a355e499cd8c6f.camel@redhat.com> I have a change proposed [1] as part of the work to add vTPM support to nova that will modify the arguments for the 'unrescue' function. As noted in the commit message, this is expected to gain a 'context' argument and lose the currently unused 'network_info' argument. If you maintain an out-of-tree driver, you will need to account for this change. Cheers, Stephen [1] https://review.opendev.org/#/c/730382/ From gagehugo at gmail.com Tue Jul 7 19:48:09 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Tue, 7 Jul 2020 14:48:09 -0500 Subject: [openstack-helm] Proposing Andrii Ostapenko for core of OpenStack-Helm Message-ID: Hello everyone, Andrii Ostapenko (andrii_ostapenko) has been very active lately in the openstack-helm community, notably his efforts in driving loci forward as well as him maintaining a lot of our images and providing great in-depth reviews. Due to these reasons, I am proposing Andrii as a core reviewer for OpenStack-Helm. If anyone has any feedback, please feel free to reply here by the end of the week! -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Jul 7 21:13:28 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 7 Jul 2020 14:13:28 -0700 Subject: [all][TC] New Office Hours Times In-Reply-To: References: Message-ID: Hello! I wanted to push this to the top of people's inboxes again. It looks like we are still missing several TC member's responses, and I would love some more community response as well since the office hours are FOR you! Please take a few min to fill out the survey for new office hours times[1]. -Kendall (diablo_rojo) [1] https://doodle.com/poll/q27t8pucq7b8xbme On Thu, Jul 2, 2020 at 2:52 PM Kendall Nelson wrote: > Hello! > > It's been a while since the office hours had been refreshed and we have a > lot of new people on the TC that were not around when the times were set. > > In an effort to stir things up a bit, and get more community engagement, > we are picking new times! > > I want to invite everyone in the community interested in interacting more > with the TC to respond to the poll so we have your input as the office > hours are really for your benefit anyway. (Nevermind the name of the poll > :) Too much work to remake the whole thing just to rename it..) > > That said, we do need responses from ALL TC members so that we can also > document who will (typically) be present for each office hour as well. > > (Also, thanks Mohammed for putting the poll together! It's no joke. ) > > -Kendall (diablo_rojo) > > [1] https://doodle.com/poll/q27t8pucq7b8xbme > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Tue Jul 7 21:53:35 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 7 Jul 2020 16:53:35 -0500 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: <1635499f-ade9-07b4-191f-36ee431923dd@nemebean.com> On 7/2/20 2:23 AM, Radosław Piliszek wrote: > On Wed, Jul 1, 2020 at 10:31 PM Sean McGinnis wrote: >> >> On 7/1/20 2:24 PM, Hongbin Lu wrote: >>> Hi all, >>> >>> A short question. I saw a few projects are using the name 'ca_file' >>> [1] as config option, while others are using 'cafile' [2]. I wonder >>> what is the flavorite name convention? >>> >>> I asked this question because Kolla developer suggested Zun to rename >>> from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to >>> confirm if this is a good idea from Keystone's perspective. Thanks. >>> >>> Best regards, >>> Hongbin >>> >>> [1] >>> http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= >>> [2] >>> http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= >>> [3] https://review.opendev.org/#/c/738329/ >> >> Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, >> and registry_client_ca_file). >> From keystone_auth, we do also have cafile. >> >> Personally, I find the separation of ca_file to be much easier to read. >> >> Sean >> >> > > Yeah, it was me to suggest the aliasing. We found that the 'cafile' > seems more prevalent. We missed that underscore for Zun and scratched > our heads "what are we doing wrong there?". Sounds like a job for https://docs.openstack.org/oslo.config/latest/cli/validator.html ;-) I don't have a strong opinion on which we should choose, but I will note that whichever it is, we can leave deprecated names for the other so nobody gets broken by the change. Probably incomplete lists of references to both names: http://codesearch.openstack.org/?q=StrOpt%5C(%27ca_file%27&i=nope&files=&repos= http://codesearch.openstack.org/?q=StrOpt%5C(%27cafile%27&i=nope&files=&repos= Unfortunately keystone and oslo.service differ, so no matter which we choose a lot of projects are going to inherit a deprecated opt. From juliaashleykreger at gmail.com Tue Jul 7 23:18:33 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 7 Jul 2020 16:18:33 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings fellow humans, Following up, the consensus seems to have arrived at 2:30 PM UTC tomorrow (Wednesday). This is 7:30 US Pacific. We will use meetpad[1]. Thanks everyone! -Julia [1]: https://meetpad.opendev.org/ironic On Mon, Jul 6, 2020 at 9:15 AM Julia Kreger wrote: > > Greetings fellow humans! > > We had a great two hour session but we ran out of time to get back to > the discussion of a capability/driver support matrix. > > We agreed we should have a call later in the week to dive back into > the topic. I've created a doodle[1] for us to identify the best time > for a hopefully quick 30 minute call to try and reach consensus. > > Thanks everyone! > > -Julia > > [1]: https://doodle.com/poll/kte79im2tz4ape9v > > On Mon, Jul 6, 2020 at 6:12 AM Julia Kreger wrote: > > > > Greetings everyone! > > > > We'll use our meetpad[1]! > > > > -Julia > > > > [1]: https://meetpad.opendev.org/ironic > > > > On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > > > > > Hi all, > > > > > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > > > > > Dmitry > > > > > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > > >> > > >> Hi all, > > >> > > >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! > > >> > > >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. > > >> > > >> Dmitry From juliaashleykreger at gmail.com Tue Jul 7 23:32:51 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 7 Jul 2020 16:32:51 -0700 Subject: OpenStack cluster event notification In-Reply-To: References: Message-ID: Greetings! Unfortunately, I don't know the eventing context outside of ironic's ability to emit them, but I'll try my best to answer questions with my context. On Tue, Jul 7, 2020 at 1:02 AM Anil Jangam wrote: > > Hi All, > > So far, based on my understanding of OpenStack Python SDK, I am able to read the Hypervisor, Servers instances, however, I do not see an API to receive and handle the change notification/events for the operations that happens on the cluster e.g. A new VM is added, an existing VM is deleted etc. > > I see a documentation, which talks about emitting notifications over a message bus that indicate different events that occur within the service. > > Notifications in OpenStack > > https://docs.openstack.org/ironic/latest/admin/notifications.html I suspect you may also find https://docs.openstack.org/nova/latest/reference/notifications.html useful. > > Does Openstack Python SDK support notification APIs? I'm going to guess the answer is no to this. As you noted earlier, the notifications are emitted to the message bus. These notifications can be read by a subscriber to the message bus itself, but this also means that the bus is directly connected to by some sort of messaging client. The Python SDK is intended for developers to use to leverage the REST APIs offered by services and components, not the message bus. > How do I receive/monitor notifications for VM related changes? > How do I receive/monitor notifications for compute/hypervisor related changes? > How do I receive/monitor notifications for Virtual Switch related changes? I think what you are looking for is ceilometer. https://docs.openstack.org/ceilometer/latest/admin/telemetry-data-collection.html#notifications Although that being said, I don't think much would really prevent you from consuming the notifications directly from the message bus, if you so desire. Maybe someone already has some code for this on hand. > > Thanks in advance for any help in this regard. > Hope this helped. > /anil. > -Julia From gouthampravi at gmail.com Wed Jul 8 00:18:48 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Tue, 7 Jul 2020 17:18:48 -0700 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: On Tue, Jun 23, 2020 at 3:32 AM Slawek Kaplonski wrote: > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend > in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, > neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by > the > original authors of OVS. It is an attempt to re-do the ML2/OVS control > plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different > architecture, > moving us away from Python agents communicating with the Neutron API > service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the > main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are > built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those > gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack > configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and > it > works fine. Recently (See [7]) we also proposed to add an OVN based job to > the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein > cycle > and there were no any significant issues related strictly to this change. > It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, > like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the > names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make > such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this > thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron > repo to > Devstack repo - we don’t want to force everyone else to add > “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron > with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main > OpenStack > projects to check if it will not break anything in the gate of those > projects. > +1 This plan looks great. We test Neutron integration quite a bit in OpenStack Manila devstack jobs and in third party CI associated with the project. We've tested OVN in the past and noticed it made share server provisioning faster and more reliable. So I don't think we would be affected negatively should you change the default mechanism and driver. However, please keep us in mind, and perhaps alert me when you post patches so we can test everything is okay. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Jul 8 08:51:01 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 8 Jul 2020 10:51:01 +0200 Subject: [largescale-sig] Next meeting: July 8, 8utc In-Reply-To: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> References: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> Message-ID: Meeting logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-07-08-08.00.html TODOs: - ttx to identify from the chat interested candidates from Opendev event and invite them to next meeting - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - amorin to start a thread on osarchiver proposing to land it somewhere in openstack - amorin to start a [largescale-sig] thread about his middleware ping approach, SIG members can comment if that makes sense for them Next meeting: Jul 22, 8:00UTC on #openstack-meeting-3 -- Thierry Carrez (ttx) From reza.b2008 at gmail.com Wed Jul 8 11:39:48 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Wed, 8 Jul 2020 16:09:48 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails Message-ID: Hi, I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} Any suggestion would be grateful. Regards, Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Wed Jul 8 12:05:29 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Wed, 08 Jul 2020 14:05:29 +0200 Subject: OpenStack cluster event notification In-Reply-To: References: Message-ID: <59G5DQ.5J8F1FJXF7IT3@est.tech> On Tue, Jul 7, 2020 at 16:32, Julia Kreger wrote: [snip] > > Although that being said, I don't think much would really prevent you > from consuming the notifications directly from the message bus, if you > so desire. Maybe someone already has some code for this on hand. Here is some example code that forwards the nova versioned notifications from the message bus out to a client via websocket [1]. I used this sample code in my demo [2] during a summit presentation. Cheers, gibi [1] https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py [2] https://www.youtube.com/watch?v=WFq5JWXa9AM From laszlo.budai at gmail.com Wed Jul 8 14:21:53 2020 From: laszlo.budai at gmail.com (Budai Laszlo) Date: Wed, 8 Jul 2020 17:21:53 +0300 Subject: [Neutron] GRE network MTU Message-ID: <7b9a4951-8451-e495-d582-ef6eec15182c@gmail.com> Dear all, what is the maximum MTU value for a GRE network? How is that related to the physical interfaces' MTU? Thank you, Laszlo From marek.lycka at ultimum.io Wed Jul 8 14:45:26 2020 From: marek.lycka at ultimum.io (=?UTF-8?B?TWFyZWsgTHnEjWth?=) Date: Wed, 8 Jul 2020 16:45:26 +0200 Subject: [Cinder] Message-ID: Hi all, I'm currently looking into extending Nova API to allow on-demand VM quiescing with the ultimate goal being improved Cinder snapshot creation. The spec is undergoing review at the moment and I was wondering if someone from Cinder would be kind enough to look it over and give their thoughts on it: https://review.opendev.org/#/c/702810/ Thanks in advance. -- Marek Lyčka Linux Developer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic marek.lycka at ultimum.io *https://ultimum.io * -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Wed Jul 8 15:13:42 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 8 Jul 2020 09:13:42 -0600 Subject: [tripleo] updating the language we use in our code base Message-ID: Greetings, At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from our code base [1]. We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives. Chris Wright wrote a nice blog post on the subject [2], please take a second to review Bogdan's spec and Chris's blog post. Also a thank you to Emilien, Alex and Bogdan for already getting started. In other news Arx Cruz will be starting a similar thread for the tempest project. Thanks Arx! [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst [2] https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language Patches to be aware of: https://review.opendev.org/#/c/738858/ https://review.opendev.org/#/c/738894/ https://review.opendev.org/#/c/740013 Thanks for your time! -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Wed Jul 8 18:20:46 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 8 Jul 2020 11:20:46 -0700 Subject: [neutron][lbaas][octavia] How to implement health check using 100.64.0.0/10 network segments in loadbalancer? In-Reply-To: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> References: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> Message-ID: Hi Zhengyu, I'm not sure I understand your question, so I'm going to take a guess. Correct me if I am answering the wrong question. First question I have is are you using the EOL neutron-lbaas or Octavia? I will assume you are using Octavia. When you add a member server (backend web server for example), you have a few options: 1. If you create a member without the "subnet_id" option, the load balancer will attempt to route to the member IP address over the VIP subnet. Health monitor checks will also follow this route. 2. If, when you create the member, you specify the "subnet_id" option to a valid neutron subnet, the load balancer will be attached to that subnet and will route to the member IP address. If you do not specify a "monitor_address", health monitoring will follow the same route as the member IP address. 3. If you create a member, with the "monitor_address" specified, traffic will be routed to the member IP address, but health monitoring checks will be directed to the "monitor_address". To give an example: Say you have a neutron network 436f58c2-0454-49dc-888e-eaafdd178577 with a subnet of e6e46e02-7768-4ae4-89c6-314c34557b5d with CIDR 100.64.0.0/14 on it. When creating the pool member is created you would specify something like: openstack loadbalancer member create --address 100.64.100.5 --subnet-id e6e46e02-7768-4ae4-89c6-314c34557b5d --protocol-port 80 This will attach the neutron network 436f58c2-0454-49dc-888e-eaafdd178577 to the load balancer and allocate an IP address on 100.64.0.0/14 that will be used to contact the member server address of 100.64.100.5. Health monitor checks will also follow this same path to the member server. We have some documentation for this in the cookbook here: https://docs.openstack.org/octavia/latest/user/guides/basic-cookbook.html#deploy-a-basic-http-load-balancer-with-a-health-monitor I hope this helps clarify, Michael On Tue, Jul 7, 2020 at 12:45 AM Zhengyu Pan wrote: > > > There are some private cloud or public cloud introduction: They use 100.64.0.0/14 network segments to check vm's health status in load balancer. In Region supporting VPC, load balancing private network IP and health check IP will be switched to 100 network segment. I can't understand how to implement it. How to do it? > > > > > > -- > > > > > From rosmaita.fossdev at gmail.com Wed Jul 8 21:14:51 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 8 Jul 2020 17:14:51 -0400 Subject: [ops][cinder] festival of EOL - ocata and pike Message-ID: Lee Yarwood recently announced the change to 'unmaintained' status of nova stable/ocata [0] and stable/pike [1] branches, with the clever idea of back-dating the 6 month period of un-maintenance to the most recent commit to each branch. I took a look at cinder stable/ocata and stable/pike, and the most recent commit to each is 8 months ago and 7 months ago, respectively. The Cinder team discussed this at today's Cinder meeting and agreed that this email will serve as notice to the OpenStack Community that the following openstack/cinder branches have been in 'unmaintained' status for the past 6 months: - stable/ocata - stable/pike The Cinder team hereby serves notice that it is our intent to ask the openstack infra team to tag each as EOL at its current HEAD and delete the branches two weeks from today, that is, on Wednesday, 22 July 2020. (This applies also to the other stable-branched cinder repositories, that is, os-brick, python-cinderclient, and python-cinderclient-extension.) Please see [2] for information about the maintenance phases and what action would need to occur before 22 July for a branch to be adopted back to the 'extended maintenance' phase. On behalf of the Cinder team, thank you for your attention to this matter. cheers, brian [0] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html [2] https://docs.openstack.org/project-team-guide/stable-branches.html From sunny at openstack.org Wed Jul 8 20:48:00 2020 From: sunny at openstack.org (Sunny Cai) Date: Wed, 8 Jul 2020 13:48:00 -0700 Subject: July OSF Community Meeting - 10 Years of OpenStack Message-ID: Hello everyone, You might have heard that OpenStack is turning 10 this year! On Thursday, July 16 at 8am PT (1500 UTC), we will be holding the 10 years of OpenStack virtual celebration in the July OSF community meeting. I have attached the calendar invite for the July OSF community meeting below. Grab your favorite OpenStack swag and bring your favorite drinks of choice to the meeting on July 16. Let’s do a virtual toast to the 10 incredible years! Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV If you have any questions, please let me know. P.S. To add more fun, feel free to try out the virtual background feature in Zoom. The 10 years of OpenStack virtual background is attached below. Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 10 Years of OpenStack Community Meeting meeting.ics Type: text/calendar Size: 1788 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 10 Years Virtual Background.jpg Type: image/jpeg Size: 530089 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Thu Jul 9 06:17:19 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Wed, 8 Jul 2020 23:17:19 -0700 Subject: OpenStack cluster event notification In-Reply-To: <59G5DQ.5J8F1FJXF7IT3@est.tech> References: <59G5DQ.5J8F1FJXF7IT3@est.tech> Message-ID: Thanks Julia for comments. Also thanks Gibi for the github link and sharing the example. I will take a look and adopt it. On Wed, Jul 8, 2020 at 5:05 AM Balázs Gibizer wrote: > > > On Tue, Jul 7, 2020 at 16:32, Julia Kreger > wrote: > [snip] > > > > > Although that being said, I don't think much would really prevent you > > from consuming the notifications directly from the message bus, if you > > so desire. Maybe someone already has some code for this on hand. > > Here is some example code that forwards the nova versioned > notifications from the message bus out to a client via websocket [1]. I > used this sample code in my demo [2] during a summit presentation. > > Cheers, > gibi > > [1] > > https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py > [2] https://www.youtube.com/watch?v=WFq5JWXa9AM > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.schaefer at cloudandheat.com Thu Jul 9 06:53:26 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Thu, 09 Jul 2020 08:53:26 +0200 Subject: [neutron] bandwidth metering based on remote address In-Reply-To: References: <2890841.xduM2AgYMW@antares> Message-ID: <25308951.foNqEPruJI@antares> Hello Rafael, On Dienstag, 7. Juli 2020 14:09:29 CEST Rafael Weingärtner wrote: > Hallo Jonas, > I have worked to address this specific use case. > > First, the part of the solution that is already implemented. If you only > need to gather metrics in a tenant fashion, you can take a look into this > PR: https://review.opendev.org/#/c/735605/. That pull request enables > operators to configure shared traffic labels, and then, these traffic > labels will be exposed/published with different granularities. The > different granularities are router, tenant, label, router-label, and > tenant-label. The complete explanation can be found in the "RST" document > that the PR also introduces, where we wrote a complete description of > neutron metering, its configs, and usage. You are welcome to review and > help us get this PR merged :) This already looks very useful to us, since it saves us from creating labels for each and every project. > So far, if all you need is to measure the whole traffic, but in different > granularities, that PR will probably be enough. Not quite; as mentioned, we’ll need to carve out specific network areas from metering, those which are in our DCs, but on the other side of the router from the customer perspective. > On the other hand, if you > need to create more complex rules to filter by source/destination IPs, then > we need something else. Interestingly enough, we are working towards that. > We will extend neutron API, and neutron metering to allow operators to use > "remote-ip" and "source-ip" to create metering labels rules. That sounds exactly like what we’d need. > We also saw the PR that changed the behavior of the "remote-ip" property, > and the whole confusion it caused (at least for us). However, instead of > proposing to revert it, we are working towards enabling the API to handle > "remote-ip" and "source-ip", which will cover the use case of the person > that introduced that commit, and many others such as ours and yours > (probably). Sounds good. Is there a way we can collaborate on this? Is there a launchpad bug which tracks that? (Also, is there a launchpad thing for the shared label granularity you’re doing already? I didn’t find one mentioned on the gerrit page.) kind regards, Jonas Schäfer > > On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < > > jonas.schaefer at cloudandheat.com> wrote: > > Dear list, > > > > We are trying to implement tenant bandwidth metering at the neutron router > > level. Since some of the network spaces connected to the external > > interface of > > the neutron router are supposed to be unmetered, we need to match on the > > remote address. > > > > Conveniently, there exists a --remote-ip-prefix option on meter label > > create; > > however, since [1], its meaning was changed to the exact opposite: Instead > > of > > matching on the *remote* prefix (towards the external interface), it > > matches > > on the *local* prefix (towards the OS tenant network). > > > > In an ideal world, we would want to revert that change and instead > > introduce a > > --local-ip-prefix option which covers that use-case. I suppose this is not > > a > > thing we *should* do though, given that this change made it into a few > > releases already. > > > > Instead, we’ll have to create a new option (which whatever name) + > > associated > > database schema + iptables rule patterns to implement the feature. > > > > The questions associated with this are now: > > > > - Does this make absolutely no sense to anyone? > > - What is the process for this? I suppose since this change was made > > intentionally and passed review, our desired change needs to go through a > > feature request process (blueprints maybe?). > > > > kind regards, > > Jonas Schäfer > > > > [1]: https://opendev.org/openstack/neutron/commit/ > > > > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > > > > -- > > Jonas Schäfer > > DevOps Engineer > > > > Cloud&Heat Technologies GmbH > > Königsbrücker Straße 96 | 01099 Dresden > > +49 351 479 367 37 > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > New Service: > > Managed Kubernetes designed for AI & ML > > https://managed-kubernetes.cloudandheat.com/ > > > > Commercial Register: District Court Dresden > > Register Number: HRB 30549 > > VAT ID No.: DE281093504 > > Managing Director: Nicolas Röhrs > > Authorized signatory: Dr. Marius Feldmann > > Authorized signatory: Kristina Rübenkamp -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From anilj.mailing at gmail.com Thu Jul 9 08:22:22 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Thu, 9 Jul 2020 01:22:22 -0700 Subject: Hardware requirement for OpenStack HA Cluster Message-ID: Hi All, I am looking for hardware requirements (CPU, RAM, HDD) for installing a OpenStack HA cluster. So far, I gathered few references: - This article talks about CPU and HDD, but they do not comment on RAM. - https://docs.openstack.org/project-deploy-guide/openstack-ansible/ocata/overview-requirements.html - This article talks about CPU, RAM, and HDD, but it is quite old (2015) reference. - https://docs.huihoo.com/openstack/docs.openstack.org/ha-guide/HAGuide.pdf (Page 6) I am considering the cluster with: 3 Controller (for HA) + 1 Compute + 1 Storage. I have following questions: - What is the minimum hardware (CPU, RAM, HDD) requirement to install a OpenStack HA cluster? - Can we have 3 Controller nodes installed on 3 Virtual Machines or do we need 3 independent (bare metal) servers? - So in case of VM-based controllers, the cluster will be hybrid in nature. - I do not know if this is even possible and a recommended design. - Do we need the Platform Director node in addition to controller and compute/storage nodes? Thanks in advance. Anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Jul 9 09:15:18 2020 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 9 Jul 2020 11:15:18 +0200 Subject: [Cinder] [Nova] Quiescing In-Reply-To: References: Message-ID: <20200709091518.r5usx2x3lnejvqmh@localhost> On 08/07, Marek Lyčka wrote: > Hi all, > > I'm currently looking into extending Nova API to allow on-demand VM > quiescing > with the ultimate goal being improved Cinder snapshot creation. The spec is > undergoing > review at the moment and I was wondering if someone from Cinder would be > kind > enough to look it over and give their thoughts on it: > > https://review.opendev.org/#/c/702810/ > > Thanks in advance. > Hi Marek, I'm really glad to hear somebody will be working on this functionality. I have reviewed the spec and Cinder (and probably anyone using the feature) needs a REST API to query the current state of the quiesce, unless the quiesce call is actually synchronous and doesn't return until it's done. Cheers, Gorka. > -- > Marek Lyčka > Linux Developer > > Ultimum Technologies a.s. > Na Poříčí 1047/26, 11000 Praha 1 > Czech Republic > > marek.lycka at ultimum.io > *https://ultimum.io * From skaplons at redhat.com Thu Jul 9 10:08:52 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:08:52 +0200 Subject: [neutron] PTL on vacation Message-ID: <86027844-7D61-4D04-9A14-559D56BBEDEA@redhat.com> Hi, For the next 2 weeks, starting Saturday 11th of July I will be on vacation without access to the irc and with very limited access to the email. Miguel Lavalle will run our team meetings during this time. With other things You can always ask one of our drivers [1] or lieutenants [1] https://review.opendev.org/#/admin/groups/464,members [2] https://docs.openstack.org/neutron/latest/contributor/policies/neutron-teams.html#neutron-lieutenants — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Thu Jul 9 10:43:41 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:43:41 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <1594028941528.18866@binero.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> Message-ID: <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Hi, Thx for Your feedback. > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: > > Hello Slawek, > This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. > > We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and > are looking forward into offloading a lot of work to OVN in the future. > > We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance > and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. > > From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind > the scenes in order to get to the desired point. > > * Geneve offloading on NIC, we might need new NICs or new firmware. > * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. Yes, that’s definitely something which will require more work. > * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? For octavia, there is ovn-octavia provider: https://opendev.org/openstack/ovn-octavia-provider which You can use with OVN instead of using Amphora > > Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html which should describe most of the gaps between ML2/OVS and ML2/OVN backends. We are working on closing those gaps but please also keep in mind that ML2/OVS is not going anywhere, if You need any of features from it, You can still use it as it still is and will be maintained backend :) > > But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. Thx. I really appreciate this. > > Best regards > Tobias > ________________________________________ > From: Slawek Kaplonski > Sent: Tuesday, June 23, 2020 12:24 PM > To: OpenStack Discuss ML > Cc: Assaf Muller; Daniel Alvarez Sanchez > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend > > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > > > — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Thu Jul 9 10:45:19 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:45:19 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <54AB1156-80B5-442A-8281-3C1165561926@redhat.com> Hi, > On 8 Jul 2020, at 02:18, Goutham Pacha Ravi wrote: > > > > > > On Tue, Jun 23, 2020 at 3:32 AM Slawek Kaplonski wrote: > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > > +1 This plan looks great. We test Neutron integration quite a bit in OpenStack Manila devstack jobs and in third party CI associated with the project. We've tested OVN in the past and noticed it made share server provisioning faster and more reliable. So I don't think we would be affected negatively should you change the default mechanism and driver. However, please keep us in mind, and perhaps alert me when you post patches so we can test everything is okay. We will for sure alert others to check this in their project when it will be ready. For now Lucas is still working on patches to move ovn bits to the Devstack repo. > > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Principal software engineer Red Hat From cgoncalves at redhat.com Thu Jul 9 11:13:09 2020 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 9 Jul 2020 12:13:09 +0100 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Message-ID: On Thu, Jul 9, 2020 at 11:45 AM Slawek Kaplonski wrote: > Hi, > > Thx for Your feedback. > > > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: > > > > Hello Slawek, > > This is very interesting and I think this is the right way to go, > speakin from an operator standpoint here. > > > > We've started investing time in getting familiar with OVN, how to > operate and how to troubleshoot and > > are looking forward into offloading a lot of work to OVN in the future. > > > > We are closely looking how we can integrate hardware offloading with > OVN+OVS to improve our performance > > and in the future looking to the new VirtIO backend support for vDPA > that has started to mature more. > > > > From an operator's view, after getting familiar with OVN, there is a lot > of work that needs to be done behind > > the scenes in order to get to the desired point. > > > > * Geneve offloading on NIC, we might need new NICs or new firmware. > > * We need to migrate away from VXLAN to Geneve encapsulation, how can we > migrate our current baremetal approach > > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red > Hat has driven some work to perform this (an Geneve migration) but there is > minimal testing or real world deployments that has tried or documented the > approach. > > Yes, that’s definitely something which will require more work. > > > * And then all misc stuff, we need to look into the new > ovn-metadata-agent, should we move Octavia over to OVN yet? > > For octavia, there is ovn-octavia provider: > https://opendev.org/openstack/ovn-octavia-provider which You can use with > OVN instead of using Amphora > Before an attempt at moving from amphora to OVN load balancers, it's worth considering all the existing feature limitations of the OVN provider. OVN load balancers do not support a large feature set typically available in other load balancer solutions. For example, OVN does not support: - Round-robin, weighted round-robin, least connection, source IP, etc. It does only support one balancing algorithm: source IP-Port - HTTP, HTTPS, Proxy protocols. OVN only supports TCP and UDP with limited capabilities (e.g. no timeout knobs) - TLS termination - TLS client authentication - TLS backend encryption - Layer 7 features and header manipulation - Health monitors (WIP) - Octavia flavors - Statistics - Mixed IPv6 and IPv4 VIPs and members. More details in https://docs.openstack.org/octavia/latest/user/feature-classification/index.html > > > > > Then the final, what do we gain vs what do we lose in terms of > maintainability, performance and features. > > We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html > which should describe most of the gaps between ML2/OVS and ML2/OVN backends. > We are working on closing those gaps but please also keep in mind that > ML2/OVS is not going anywhere, if You need any of features from it, You can > still use it as it still is and will be maintained backend :) > > > > > But form an operator's view, I'm very positive to the future of a OVN > integrated OpenStack. > > Thx. I really appreciate this. > > > > > Best regards > > Tobias > > ________________________________________ > > From: Slawek Kaplonski > > Sent: Tuesday, June 23, 2020 12:24 PM > > To: OpenStack Discuss ML > > Cc: Assaf Muller; Daniel Alvarez Sanchez > > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron > Backend > > > > Hi, > > > > The Neutron team wants to propose a switch of the default Neutron > backend in > > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, > neutron-l3-agent) to > > OVN with its own ovn-metadata-agent and ovn-controller. > > We discussed that change during the virtual PTG - see [1]. > > In this document we want to explain reasons why we want to do that > change. > > > > > > OVN in 75 Words > > --------------- > > > > Open Virtual Network is managed under the OVS project, and was created > by the > > original authors of OVS. It is an attempt to re-do the ML2/OVS control > plane, > > using lessons learned throughout the years. It is intended to be used in > > projects such as OpenStack and Kubernetes. OVN has a different > architecture, > > moving us away from Python agents communicating with the Neutron API > service > > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > > > Here’s a heap of information about OpenStack’s integration of OVN: > > * OpenStack Boston Summit talk on OVN [2] > > * Upstream OpenStack networking-ovn documentation [3] and [4] > > * OSP 13 OVN documentation, including how to install it using Director > [5] > > > > Neutron OVN driver was developed as a Neutron stadium project, > > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into > the main > > Neutron repository. > > > > > > Why? > > ---- > > > > In the Neutron team we believe that OVN and the Neutron OVN driver are > built > > with a modern architecture that offers better foundations for a simpler > and > > more performant solution. We see increased participation in > kubernetes-ovn, > > resulting in a larger core OVN community, and we would like OpenStack to > > benefit from this Kubernetes driven OVN investment. > > Neutron OVN driver currently has got some feature parity gaps comparing > to > > ML2/OVS (see [6] for details) but our team is working hard to close > those gaps > > and we believe that this driver is the future for Neutron and that’s why > we > > want to make it the default Neutron ML2 backend in the Devstack > configuration. > > > > > > What Does it Mean? > > ------------------ > > > > Since most Openstack projects use Neutron in their CI and gate jobs, this > > change has the potential for a large impact. > > But this backend is already tested with various jobs in the Neutron CI > and it > > works fine. Recently (See [7]) we also proposed to add an OVN based job > to the > > Devstack’s check queue. > > Similarly the default Neutron backend in TripleO was changed in the > Stein cycle > > and there were no any significant issues related strictly to this > change. It > > worked well for other projects. > > Of course in the Neutron project we will be still gating other drivers, > like > > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the > names of > > some of the jobs. > > The Neutron team is *NOT* going to deprecate any of the other existing > ML2 > > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > > drivers in the same way as it is now. > > > > > > Action Plan > > ----------- > > > > We want to make this change before the Victoria-2 milestone to not make > such > > changes too late in the release cycle. Our action plan is as below: > > > > 1. Share the plan and get feedback from the upstream community (this > thread) > > 2. Move OVN related Devstack code from a plugin defined in the Neutron > repo to > > Devstack repo - we don’t want to force everyone else to add > “enable_plugin > > neutron” in their local.conf file to use default Neutron backend, > > 3. Switch default Neutron backend in Devstack to be OVN, > > a. Switch definition of base devstack CI jobs that it will run Neutron > with > > OVN backend, > > 4. Propose DNM patches depend on patch from point 3 and 3a to main > OpenStack > > projects to check if it will not break anything in the gate of those > projects. > > 5. If all will be running fine, merge patches proposed in points 3 and > 3a. > > > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - > 193 > > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > > [5] > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > > [7] https://review.opendev.org/#/c/736021/ > > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > > > > > > > — > Slawek Kaplonski > Principal software engineer > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Thu Jul 9 11:53:29 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 9 Jul 2020 08:53:29 -0300 Subject: [neutron] bandwidth metering based on remote address In-Reply-To: <25308951.foNqEPruJI@antares> References: <2890841.xduM2AgYMW@antares> <25308951.foNqEPruJI@antares> Message-ID: I created a bug track for the extension of the neutron metering granularities: https://bugs.launchpad.net/neutron/+bug/1886949 I am never sure about those "paper work", I normally propose the pull requests, and wait for the guidance of the community. About the source/destination filtering, I have not published anything yet. So far, we defined/specified what we need/want from the Neutron metering sub-system. Next week I am supposed to start on this matter. Therefore, as soon as I have updates, I will create the bug report, and pull requests. You can help me now by reviewing the PR I already have open, and of course, testing/using it :) On Thu, Jul 9, 2020 at 3:54 AM Jonas Schäfer < jonas.schaefer at cloudandheat.com> wrote: > Hello Rafael, > > On Dienstag, 7. Juli 2020 14:09:29 CEST Rafael Weingärtner wrote: > > Hallo Jonas, > > I have worked to address this specific use case. > > > > First, the part of the solution that is already implemented. If you only > > need to gather metrics in a tenant fashion, you can take a look into this > > PR: https://review.opendev.org/#/c/735605/. That pull request enables > > operators to configure shared traffic labels, and then, these traffic > > labels will be exposed/published with different granularities. The > > different granularities are router, tenant, label, router-label, and > > tenant-label. The complete explanation can be found in the "RST" document > > that the PR also introduces, where we wrote a complete description of > > neutron metering, its configs, and usage. You are welcome to review and > > help us get this PR merged :) > > This already looks very useful to us, since it saves us from creating > labels > for each and every project. > > > So far, if all you need is to measure the whole traffic, but in different > > granularities, that PR will probably be enough. > > Not quite; as mentioned, we’ll need to carve out specific network areas > from > metering, those which are in our DCs, but on the other side of the router > from > the customer perspective. > > > On the other hand, if you > > need to create more complex rules to filter by source/destination IPs, > then > > we need something else. Interestingly enough, we are working towards > that. > > We will extend neutron API, and neutron metering to allow operators to > use > > "remote-ip" and "source-ip" to create metering labels rules. > > That sounds exactly like what we’d need. > > > We also saw the PR that changed the behavior of the "remote-ip" > property, > > and the whole confusion it caused (at least for us). However, instead of > > proposing to revert it, we are working towards enabling the API to handle > > "remote-ip" and "source-ip", which will cover the use case of the person > > that introduced that commit, and many others such as ours and yours > > (probably). > > Sounds good. Is there a way we can collaborate on this? Is there a > launchpad > bug which tracks that? (Also, is there a launchpad thing for the shared > label > granularity you’re doing already? I didn’t find one mentioned on the > gerrit > page.) > > kind regards, > Jonas Schäfer > > > > > On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < > > > > jonas.schaefer at cloudandheat.com> wrote: > > > Dear list, > > > > > > We are trying to implement tenant bandwidth metering at the neutron > router > > > level. Since some of the network spaces connected to the external > > > interface of > > > the neutron router are supposed to be unmetered, we need to match on > the > > > remote address. > > > > > > Conveniently, there exists a --remote-ip-prefix option on meter label > > > create; > > > however, since [1], its meaning was changed to the exact opposite: > Instead > > > of > > > matching on the *remote* prefix (towards the external interface), it > > > matches > > > on the *local* prefix (towards the OS tenant network). > > > > > > In an ideal world, we would want to revert that change and instead > > > introduce a > > > --local-ip-prefix option which covers that use-case. I suppose this is > not > > > a > > > thing we *should* do though, given that this change made it into a few > > > releases already. > > > > > > Instead, we’ll have to create a new option (which whatever name) + > > > associated > > > database schema + iptables rule patterns to implement the feature. > > > > > > The questions associated with this are now: > > > > > > - Does this make absolutely no sense to anyone? > > > - What is the process for this? I suppose since this change was made > > > intentionally and passed review, our desired change needs to go > through a > > > feature request process (blueprints maybe?). > > > > > > kind regards, > > > Jonas Schäfer > > > > > > [1]: https://opendev.org/openstack/neutron/commit/ > > > > > > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > > > > > > > -- > > > Jonas Schäfer > > > DevOps Engineer > > > > > > Cloud&Heat Technologies GmbH > > > Königsbrücker Straße 96 | 01099 Dresden > > > +49 351 479 367 37 > > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > > > New Service: > > > Managed Kubernetes designed for AI & ML > > > https://managed-kubernetes.cloudandheat.com/ > > > > > > Commercial Register: District Court Dresden > > > Register Number: HRB 30549 > > > VAT ID No.: DE281093504 > > > Managing Director: Nicolas Röhrs > > > Authorized signatory: Dr. Marius Feldmann > > > Authorized signatory: Kristina Rübenkamp > > > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jul 9 12:40:11 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 14:40:11 +0200 Subject: [neutron] Drivers meeting - agenda for 10.07.2020 Message-ID: <772B957B-86AB-4BD7-A594-60932F2A8304@redhat.com> Hi, For tomorrow’s meeting we have 2 RFEs to discuss: https://bugs.launchpad.net/neutron/+bug/1886798 - [RFE] Port NUMA affinity policy - this one was already discussed briefly few weeks ago - see http://eavesdrop.openstack.org/meetings/neutron_drivers/2020/neutron_drivers.2020-06-19-14.00.log.html#l-17 - but now as Rodolfo proposed official RFE, lets talk again about it, https://bugs.launchpad.net/neutron/+bug/1880532 - [RFE]L3 Router should support ECMP - this one was also discussed some time ago, owner of the rfe provided some additional info recently so please take a look into that and we will also discuss that tomorrow. Have a great day and see You on tomorrow’s meeting :) — Slawek Kaplonski Principal software engineer Red Hat From amuller at redhat.com Thu Jul 9 12:48:55 2020 From: amuller at redhat.com (Assaf Muller) Date: Thu, 9 Jul 2020 08:48:55 -0400 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Message-ID: On Thu, Jul 9, 2020 at 7:17 AM Carlos Goncalves wrote: > > > > On Thu, Jul 9, 2020 at 11:45 AM Slawek Kaplonski wrote: >> >> Hi, >> >> Thx for Your feedback. >> >> > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: >> > >> > Hello Slawek, >> > This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. >> > >> > We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and >> > are looking forward into offloading a lot of work to OVN in the future. >> > >> > We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance >> > and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. >> > >> > From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind >> > the scenes in order to get to the desired point. >> > >> > * Geneve offloading on NIC, we might need new NICs or new firmware. >> > * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach >> > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. >> >> Yes, that’s definitely something which will require more work. >> >> > * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? >> >> For octavia, there is ovn-octavia provider: https://opendev.org/openstack/ovn-octavia-provider which You can use with OVN instead of using Amphora > > > Before an attempt at moving from amphora to OVN load balancers, it's worth considering all the existing feature limitations of the OVN provider. > > OVN load balancers do not support a large feature set typically available in other load balancer solutions. For example, OVN does not support: > > - Round-robin, weighted round-robin, least connection, source IP, etc. It does only support one balancing algorithm: source IP-Port > - HTTP, HTTPS, Proxy protocols. OVN only supports TCP and UDP with limited capabilities (e.g. no timeout knobs) > - TLS termination > - TLS client authentication > - TLS backend encryption > - Layer 7 features and header manipulation > - Health monitors (WIP) > - Octavia flavors > - Statistics > - Mixed IPv6 and IPv4 VIPs and members. > > More details in https://docs.openstack.org/octavia/latest/user/feature-classification/index.html Exactly. The Amphora and OVN drivers: a) Can be loaded at the same time b) Users can choose which driver to use per LB c) Are complementary, they don't replace one another The intention is that you could use an OVN based LB for 'simple' use cases, where you don't require any of the functionality Carlos highlighted above, and Amphora for the rest. The assumption here is that for simple use cases OVN based LBs perform and scale better, though we haven't quite been able to confirm that yet. > >> >> >> > >> > Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. >> >> We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html which should describe most of the gaps between ML2/OVS and ML2/OVN backends. >> We are working on closing those gaps but please also keep in mind that ML2/OVS is not going anywhere, if You need any of features from it, You can still use it as it still is and will be maintained backend :) >> >> > >> > But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. >> >> Thx. I really appreciate this. >> >> > >> > Best regards >> > Tobias >> > ________________________________________ >> > From: Slawek Kaplonski >> > Sent: Tuesday, June 23, 2020 12:24 PM >> > To: OpenStack Discuss ML >> > Cc: Assaf Muller; Daniel Alvarez Sanchez >> > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend >> > >> > Hi, >> > >> > The Neutron team wants to propose a switch of the default Neutron backend in >> > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to >> > OVN with its own ovn-metadata-agent and ovn-controller. >> > We discussed that change during the virtual PTG - see [1]. >> > In this document we want to explain reasons why we want to do that change. >> > >> > >> > OVN in 75 Words >> > --------------- >> > >> > Open Virtual Network is managed under the OVS project, and was created by the >> > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, >> > using lessons learned throughout the years. It is intended to be used in >> > projects such as OpenStack and Kubernetes. OVN has a different architecture, >> > moving us away from Python agents communicating with the Neutron API service >> > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. >> > >> > Here’s a heap of information about OpenStack’s integration of OVN: >> > * OpenStack Boston Summit talk on OVN [2] >> > * Upstream OpenStack networking-ovn documentation [3] and [4] >> > * OSP 13 OVN documentation, including how to install it using Director [5] >> > >> > Neutron OVN driver was developed as a Neutron stadium project, >> > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main >> > Neutron repository. >> > >> > >> > Why? >> > ---- >> > >> > In the Neutron team we believe that OVN and the Neutron OVN driver are built >> > with a modern architecture that offers better foundations for a simpler and >> > more performant solution. We see increased participation in kubernetes-ovn, >> > resulting in a larger core OVN community, and we would like OpenStack to >> > benefit from this Kubernetes driven OVN investment. >> > Neutron OVN driver currently has got some feature parity gaps comparing to >> > ML2/OVS (see [6] for details) but our team is working hard to close those gaps >> > and we believe that this driver is the future for Neutron and that’s why we >> > want to make it the default Neutron ML2 backend in the Devstack configuration. >> > >> > >> > What Does it Mean? >> > ------------------ >> > >> > Since most Openstack projects use Neutron in their CI and gate jobs, this >> > change has the potential for a large impact. >> > But this backend is already tested with various jobs in the Neutron CI and it >> > works fine. Recently (See [7]) we also proposed to add an OVN based job to the >> > Devstack’s check queue. >> > Similarly the default Neutron backend in TripleO was changed in the Stein cycle >> > and there were no any significant issues related strictly to this change. It >> > worked well for other projects. >> > Of course in the Neutron project we will be still gating other drivers, like >> > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of >> > some of the jobs. >> > The Neutron team is *NOT* going to deprecate any of the other existing ML2 >> > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree >> > drivers in the same way as it is now. >> > >> > >> > Action Plan >> > ----------- >> > >> > We want to make this change before the Victoria-2 milestone to not make such >> > changes too late in the release cycle. Our action plan is as below: >> > >> > 1. Share the plan and get feedback from the upstream community (this thread) >> > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to >> > Devstack repo - we don’t want to force everyone else to add “enable_plugin >> > neutron” in their local.conf file to use default Neutron backend, >> > 3. Switch default Neutron backend in Devstack to be OVN, >> > a. Switch definition of base devstack CI jobs that it will run Neutron with >> > OVN backend, >> > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack >> > projects to check if it will not break anything in the gate of those projects. >> > 5. If all will be running fine, merge patches proposed in points 3 and 3a. >> > >> > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 >> > [2] https://www.youtube.com/watch?v=sgc7myiX6ts >> > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html >> > [4] https://docs.openstack.org/neutron/latest/ovn/index.html >> > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ >> > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html >> > [7] https://review.opendev.org/#/c/736021/ >> > >> > -- >> > Slawek Kaplonski >> > Senior software engineer >> > Red Hat >> > >> > >> > >> > >> >> — >> Slawek Kaplonski >> Principal software engineer >> Red Hat >> >> From radoslaw.piliszek at gmail.com Thu Jul 9 14:32:16 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jul 2020 16:32:16 +0200 Subject: [kolla] Today is the first Kall Message-ID: Hiya Folks, today (09 Jul) is the first Kolla's Kall [1]. It starts at 15:00 UTC so in just a bit less than half an hour. (Sorry for the late reminder, these days don't spare me.) Today's agenda is based on one of the top priorities for Victoria - DA DOCS. We decided to give Meetpad a try. It does not record meetings but we will document the meeting on etherpad anyhow. The link is on the referenced wiki page. (The potential fallback will be Google Meet, we will update the wiki). Everyone is free to join. Kolla Kall is development-oriented and focuses on implementation discussion, change planning, release planning, housekeeping, etc. The expected audience is people interested in Kolla projects development, including Kolla, Kolla-Ansible and Kayobe. Looking forward to seeing YOU there. [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall -yoctozepto From arxcruz at redhat.com Thu Jul 9 15:14:58 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 17:14:58 +0200 Subject: [qa][tempest] Update language in tempest code base Message-ID: Hello, I would like to start a discussion regarding the topic. At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from tempest codebase, such as blacklist, whitelist. We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives. As you can see in [1] the TripleO project is already working on this initiative, and I would like to work on this as well on the tempest side. Any thoughts? Shall I start with a sepc, adding deprecation warnings? [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst Kind regards, -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Thu Jul 9 15:51:35 2020 From: emccormick at cirrusseven.com (Erik McCormick) Date: Thu, 9 Jul 2020 11:51:35 -0400 Subject: Hardware requirement for OpenStack HA Cluster In-Reply-To: References: Message-ID: On Thu, Jul 9, 2020 at 4:25 AM Anil Jangam wrote: > Hi All, > > I am looking for hardware requirements (CPU, RAM, HDD) for installing a > OpenStack HA cluster. > So far, I gathered few references: > > - This article talks about CPU and HDD, but they do not comment on > RAM. > - > https://docs.openstack.org/project-deploy-guide/openstack-ansible/ocata/overview-requirements.html > - This article talks about CPU, RAM, and HDD, but it is quite old > (2015) reference. > - > https://docs.huihoo.com/openstack/docs.openstack.org/ha-guide/HAGuide.pdf > (Page 6) > > I am considering the cluster with: 3 Controller (for HA) + 1 Compute + 1 > Storage. > > I have following questions: > > - What is the minimum hardware (CPU, RAM, HDD) requirement to install > a OpenStack HA cluster? > > For memory, you could probably get away with 16 GB on the controllers, but I would go at least 32. I have 64 in mine. My lightly loaded dev cluster sits at about 15GB used under light load. For a small cluster, I wouldn't go less than 4 cores. A single 8 core CPU will be plenty. If you think you're going to grow it and make heavy use of the APIs then double it. For HDD, you can get away with like 100 GB or even less, but you need to account for your Glance images assuming you're storing them locally. You'll also need space for logging if you're going ot deploy an ELK (or EFK) stack with it. Databases are fairly small. In a cluster with only a few compute nodes, they probably will be around 5 or 6 GB total. If you can throw a 1TB SSD at it, that should be plenty for a small cluster. > > - Can we have 3 Controller nodes installed on 3 Virtual Machines or do > we need 3 independent (bare metal) servers? > - So in case of VM-based controllers, the cluster will be hybrid in > nature. > - I do not know if this is even possible and a recommended design. > > I guess it depends on your threshold for failure. It seems to me to defeat the purpose of HA to stick everything on one physical box. It's certainly fine for testing / demonstration purposes. Is it supported? Sure. Is it recommended? No. > > - > - Do we need the Platform Director node in addition to controller and > compute/storage nodes? > > I am not familiar with OSA enough to say for sure, but I don't think so. You should be able to deploy with 'localhost' in your inventory as one of your controllers. You can also simply run the deployment from a linux VM on a laptop if you want. You shouldn't have to dedicate something. That being said, if you have a box where those things can live and be used repeatedly for reconfiguration and upgrade, it would probably make your life less complicated. > Thanks in advance. > Anil. > > > -Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 9 15:57:14 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 09 Jul 2020 10:57:14 -0500 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: References: Message-ID: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote ---- > Hello, > I would like to start a discussion regarding the topic. > At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from tempest codebase, such as blacklist, whitelist.We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives.As you can see in [1] the TripleO project is already working on this initiative, and I would like to work on this as well on the tempest side. Thanks Arx for raising it. I always have hard time to understand the definition of 'outdated naming conventions ' are they outdated from coding language perspective or outdated as English language perspective? I do not see naming used in coding language should be matched with English as grammar/outdated/new style language. As long as they are not so bad (hurt anyone culture, abusing word etc) it is fine to keep them as it is and start adopting new names for new things we code. For me, naming convention are the things which always can be improved over time, none of the name is best suited for everyone in open source. But we need to understand whether it is worth to do in term of 1. effort of changing those 2. un- comfortness of adopting new names 3. again changing in future. At least from Tempest perspective, blacklist is very known common word used for lot of interfaces and dependent testing tool. I cannot debate on how good it is or bad but i can debate on not-worth to change now. For new interface, we can always use best-suggested name as per that time/culture/maintainers. We have tried few of such improvement in past but end up not-successful. Example: - https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a7f493b6086cab9/tempest/test.py#L43 -gmann > > Any thoughts? Shall I start with a sepc, adding deprecation warnings? > > [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst > Kind regards, > > > -- > Arx Cruz > Software Engineer > Red Hat EMEA > arxcruz at redhat.com > @RedHat Red Hat Red Hat > From peljasz at yahoo.co.uk Thu Jul 9 16:00:22 2020 From: peljasz at yahoo.co.uk (lejeczek) Date: Thu, 9 Jul 2020 17:00:22 +0100 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> Message-ID: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Hi guys, I've packstaked a deployment with: CONFIG_CINDER_BACKEND=gluster CONFIG_CINDER_VOLUMES_CREATE=y CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs But after seemingly all work okey I keep getting: 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume Traceback (most recent call last): 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/cmd/volume.py", line 103, in _launch_service 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     cluster=cluster) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/service.py", line 400, in create 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     cluster=cluster, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/service.py", line 155, in __init__ 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     *args, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line 267, in __init__ 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     active_backend_id=curr_active_backend_id) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in import_object 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     return import_class(import_str)(*args, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 30, in import_class 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     __import__(mod_str) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume ... I'm Centos 8 with "ussuri". Would you know & share a solution? many thanks, L. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 9 16:05:54 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jul 2020 18:05:54 +0200 Subject: [kolla] Today is the first Kall In-Reply-To: References: Message-ID: And it's a wrap! The first Kall was pretty successful and we left the notes in https://etherpad.opendev.org/p/kollakall Thanks all for joining the first Kall! See you next time! (In two weeks time) -yoctozepto On Thu, Jul 9, 2020 at 4:32 PM Radosław Piliszek wrote: > > Hiya Folks, > > today (09 Jul) is the first Kolla's Kall [1]. > It starts at 15:00 UTC so in just a bit less than half an hour. > (Sorry for the late reminder, these days don't spare me.) > > Today's agenda is based on one of the top priorities for Victoria - DA DOCS. > We decided to give Meetpad a try. It does not record meetings but we > will document the meeting on etherpad anyhow. > The link is on the referenced wiki page. > (The potential fallback will be Google Meet, we will update the wiki). > > Everyone is free to join. Kolla Kall is development-oriented and > focuses on implementation discussion, change planning, release > planning, housekeeping, etc. The expected audience is people > interested in Kolla projects development, including Kolla, > Kolla-Ansible and Kayobe. > > Looking forward to seeing YOU there. > > [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall > > -yoctozepto From waboring at hemna.com Thu Jul 9 16:10:15 2020 From: waboring at hemna.com (Walter Boring) Date: Thu, 9 Jul 2020 12:10:15 -0400 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' In-Reply-To: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Message-ID: Glusterfs driver was deprecated in the Newton release and removed in the Ocata release. https://docs.openstack.org/releasenotes/cinder/ocata.html On Thu, Jul 9, 2020 at 12:05 PM lejeczek wrote: > Hi guys, > > I've packstaked a deployment with: > > CONFIG_CINDER_BACKEND=gluster > CONFIG_CINDER_VOLUMES_CREATE=y > CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs > > But after seemingly all work okey I keep getting: > > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume Traceback (most > recent call last): > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/cmd/volume.py", line 103, in > _launch_service > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > cluster=cluster) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/service.py", line 400, in create > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > cluster=cluster, **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/service.py", line 155, in __init__ > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume *args, > **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line 267, in > __init__ > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > active_backend_id=curr_active_backend_id) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in > import_object > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume return > import_class(import_str)(*args, **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 30, in > import_class > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > __import__(mod_str) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ... > > I'm Centos 8 with "ussuri". > Would you know & share a solution? > > many thanks, L. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Thu Jul 9 16:13:14 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 09 Jul 2020 18:13:14 +0200 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' In-Reply-To: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Message-ID: <3161947.k3LOHGUjKi@whitebase.usersys.redhat.com> On Thursday, 9 July 2020 18:00:22 CEST lejeczek wrote: > Hi guys, > > I've packstaked a deployment with: > > CONFIG_CINDER_BACKEND=gluster > CONFIG_CINDER_VOLUMES_CREATE=y > CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs > > But after seemingly all work okey I keep getting: > > [...] > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ModuleNotFoundError: No module named > 'cinder.volume.drivers.glusterfs' > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ... > > I'm Centos 8 with "ussuri". > Would you know & share a solution? The glusterfs volume driver for cinder was deprecated in the newton release and removed during the pike cycle: https://review.opendev.org/#/c/377028/ There is still a glusterfs *backup* driver, not sure about its status though. -- Luigi From ltoscano at redhat.com Thu Jul 9 16:15:11 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 09 Jul 2020 18:15:11 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> Message-ID: <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote > ---- > > Hello, > > I would like to start a discussion regarding the topic. > > At this moment in time we have an opportunity to be a more open and > > inclusive project by eliminating outdated naming conventions from > > tempest codebase, such as blacklist, whitelist.We should take the > > opportunity and do our best to replace outdated terms with their more > > inclusive alternatives.As you can see in [1] the TripleO project is > > already working on this initiative, and I would like to work on this as > > well on the tempest side. > Thanks Arx for raising it. > > I always have hard time to understand the definition of 'outdated naming > conventions ' are they outdated from coding language perspective or > outdated as English language perspective? I do not see naming used in > coding language should be matched with English as grammar/outdated/new > style language. As long as they are not so bad (hurt anyone culture, > abusing word etc) it is fine to keep them as it is and start adopting new > names for new things we code. > > For me, naming convention are the things which always can be improved over > time, none of the name is best suited for everyone in open source. But we > need to understand whether it is worth to do in term of 1. effort of > changing those 2. un- comfortness of adopting new names 3. again changing > in future. > > At least from Tempest perspective, blacklist is very known common word used > for lot of interfaces and dependent testing tool. I cannot debate on how > good it is or bad but i can debate on not-worth to change now. For new > interface, we can always use best-suggested name as per that > time/culture/maintainers. We have tried few of such improvement in past but > end up not-successful. Example: - > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > 7f493b6086cab9/tempest/test.py#L43 > That's not the only used terminology for list of things, though. We could always add new interfaces and keep the old ones are deprecated (but not advertised) for the foreseable future. The old code won't be broken and the new one would use the new terminology, I'd say it's a good solution. -- Luigi From elod.illes at est.tech Thu Jul 9 16:27:21 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Thu, 9 Jul 2020 18:27:21 +0200 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: References: Message-ID: <8225c61e-687c-0116-da07-52443f315e43@est.tech> Hi, Sorry for sticking my nose into this thread (again o:)), just a couple of thoughts: - we had a rough month with failing Devstack and Tempest (and other) jobs, but thanks to Gmann and others we could fix most of the issues (except Tempest in Ocata, that's why it is announced generally as Unmaintained [0]) - this added some extra time to show a branch as unmaintained - branches in extended maintenance are not that busy branches, but still, I see some bugfix backports coming in even in Pike (in spite of failing gate in the last month) - Lee announced nova's Unmaintained state in the same circumstances, as we just fixed Pike's devstack - and I also sent a reply that I will continue to maintain nova's stable/pike as it is getting in a better shape now Last but not least: in cinder, there are "Zuul +1"d gate fixes both for Pike [1] (and Queens [2]), so it's not that hopeless. I don't want to keep a broken branch open in any cost, but does it cost that much? I mean, if there is the possibility to push a fix, why don't we let it happen? Right now Cinder Pike's gate seems working (with the fix, which needs an approve [1]). My suggestion is that let Pike still be in Extended Maintenance as it is still have a working gate ([1]) and EOL Ocata as it was already about to happen according to the mail thread [0], if necessary. Also, please check the steps in 'End of Life' chapter of the stable guideline [3] and let me offer my help if you need it for the transition. Cheers, Előd [0] http://lists.openstack.org/pipermail/openstack-discuss/2020-May/thread.html#15112 [1] https://review.opendev.org/#/c/737094/ [2] https://review.opendev.org/#/c/737093/ [3] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life On 2020. 07. 08. 23:14, Brian Rosmaita wrote: > Lee Yarwood recently announced the change to 'unmaintained' status of > nova stable/ocata [0] and stable/pike [1] branches, with the clever > idea of back-dating the 6 month period of un-maintenance to the most > recent commit to each branch.  I took a look at cinder stable/ocata > and stable/pike, and the most recent commit to each is 8 months ago > and 7 months ago, respectively. > > The Cinder team discussed this at today's Cinder meeting and agreed > that this email will serve as notice to the OpenStack Community that > the following openstack/cinder branches have been in 'unmaintained' > status for the past 6 months: > - stable/ocata > - stable/pike > > The Cinder team hereby serves notice that it is our intent to ask the > openstack infra team to tag each as EOL at its current HEAD and delete > the branches two weeks from today, that is, on Wednesday, 22 July 2020. > > (This applies also to the other stable-branched cinder repositories, > that is, os-brick, python-cinderclient, and > python-cinderclient-extension.) > > Please see [2] for information about the maintenance phases and what > action would need to occur before 22 July for a branch to be adopted > back to the 'extended maintenance' phase. > > On behalf of the Cinder team, thank you for your attention to this > matter. > > > cheers, > brian > > > [0] > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html > [2] https://docs.openstack.org/project-team-guide/stable-branches.html > From arxcruz at redhat.com Thu Jul 9 16:45:19 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 18:45:19 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> Message-ID: Yes, that's the idea. We can keep the old interface for a few cycles, with warning deprecation message advertising to use the new one, and then remove in the future. Kind regards, On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano wrote: > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz > wrote > > ---- > > > Hello, > > > I would like to start a discussion regarding the topic. > > > At this moment in time we have an opportunity to be a more open and > > > inclusive project by eliminating outdated naming conventions from > > > tempest codebase, such as blacklist, whitelist.We should take the > > > opportunity and do our best to replace outdated terms with their more > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > already working on this initiative, and I would like to work on this > as > > > well on the tempest side. > > Thanks Arx for raising it. > > > > I always have hard time to understand the definition of 'outdated naming > > conventions ' are they outdated from coding language perspective or > > outdated as English language perspective? I do not see naming used in > > coding language should be matched with English as grammar/outdated/new > > style language. As long as they are not so bad (hurt anyone culture, > > abusing word etc) it is fine to keep them as it is and start adopting new > > names for new things we code. > > > > For me, naming convention are the things which always can be improved > over > > time, none of the name is best suited for everyone in open source. But we > > need to understand whether it is worth to do in term of 1. effort of > > changing those 2. un- comfortness of adopting new names 3. again changing > > in future. > > > > At least from Tempest perspective, blacklist is very known common word > used > > for lot of interfaces and dependent testing tool. I cannot debate on how > > good it is or bad but i can debate on not-worth to change now. For new > > interface, we can always use best-suggested name as per that > > time/culture/maintainers. We have tried few of such improvement in past > but > > end up not-successful. Example: - > > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > 7f493b6086cab9/tempest/test.py#L43 > > > > That's not the only used terminology for list of things, though. We could > always add new interfaces and keep the old ones are deprecated (but not > advertised) for the foreseable future. The old code won't be broken and > the > new one would use the new terminology, I'd say it's a good solution. > > > -- > Luigi > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 9 16:50:35 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 9 Jul 2020 18:50:35 +0200 Subject: [TripleO]Documentation to list all options in yaml file and possible values Message-ID: Hi all, 1) Is there a page or a draft, where all options of TripleO are available? 2) Is there a page or a draft, where dependencies of each option are listed? 3) Is there a page or a draft, where all possible values for each option would be listed? -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 9 17:06:39 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 09 Jul 2020 12:06:39 -0500 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> Message-ID: <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> ---- On Thu, 09 Jul 2020 11:45:19 -0500 Arx Cruz wrote ---- > Yes, that's the idea. > We can keep the old interface for a few cycles, with warning deprecation message advertising to use the new one, and then remove in the future. Deprecating things leads to two situations which really need some good reason before doing it: - If we keep the deprecated interfaces working along with new interfaces then it is confusion for users as well as maintenance effort. In my experience, very less migration happen to new things if old keep working. - If we remove them in future then it is breaking change. IMO, we need to first ask/analyse whether name changes are worth to do with above things as results. Or in other team we should first define what is 'outdated naming conventions' and how worth to fix those. -gmann > Kind regards, > > On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano wrote: > > > -- > Arx Cruz > Software Engineer > Red Hat EMEA > arxcruz at redhat.com > @RedHat Red Hat Red Hat > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote > > ---- > > > Hello, > > > I would like to start a discussion regarding the topic. > > > At this moment in time we have an opportunity to be a more open and > > > inclusive project by eliminating outdated naming conventions from > > > tempest codebase, such as blacklist, whitelist.We should take the > > > opportunity and do our best to replace outdated terms with their more > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > already working on this initiative, and I would like to work on this as > > > well on the tempest side. > > Thanks Arx for raising it. > > > > I always have hard time to understand the definition of 'outdated naming > > conventions ' are they outdated from coding language perspective or > > outdated as English language perspective? I do not see naming used in > > coding language should be matched with English as grammar/outdated/new > > style language. As long as they are not so bad (hurt anyone culture, > > abusing word etc) it is fine to keep them as it is and start adopting new > > names for new things we code. > > > > For me, naming convention are the things which always can be improved over > > time, none of the name is best suited for everyone in open source. But we > > need to understand whether it is worth to do in term of 1. effort of > > changing those 2. un- comfortness of adopting new names 3. again changing > > in future. > > > > At least from Tempest perspective, blacklist is very known common word used > > for lot of interfaces and dependent testing tool. I cannot debate on how > > good it is or bad but i can debate on not-worth to change now. For new > > interface, we can always use best-suggested name as per that > > time/culture/maintainers. We have tried few of such improvement in past but > > end up not-successful. Example: - > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > 7f493b6086cab9/tempest/test.py#L43 > > > > That's not the only used terminology for list of things, though. We could > always add new interfaces and keep the old ones are deprecated (but not > advertised) for the foreseable future. The old code won't be broken and the > new one would use the new terminology, I'd say it's a good solution. > > > -- > Luigi > > > From fungi at yuggoth.org Thu Jul 9 17:26:23 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 9 Jul 2020 17:26:23 +0000 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> Message-ID: <20200709172622.kbl2pdkycjp6gouv@yuggoth.org> On 2020-07-09 10:57:14 -0500 (-0500), Ghanshyam Mann wrote: [...] > I always have hard time to understand the definition of 'outdated > naming conventions' are they outdated from coding language > perspective or outdated as English language perspective? [...] It's a recently popular euphemism for words which make people uncomfortable. Unfortunately, rather than addressing the problem head on and admitting that's the primary driver for the change, it has become preferable to pretend that's not the impetus for wholesale replacements of established terminology (often in an attempt to avoid heated debate over the value of such changes). Don't get me wrong, I think it's entirely reasonable to replace words or phrases which make people uncomfortable, and in many cases it's an opportunity to improve our terminology by using words which have direct meaning rather than relying on computer science jargon based on idiom and loose analogy. Even if this comes at the cost of some engineering effort, it can be a long-term improvement. But let's not kid ourselves, we're replacing words because they're deemed offensive. It's disingenuous, even potentially insulting, to imply otherwise. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From arxcruz at redhat.com Thu Jul 9 17:34:55 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 19:34:55 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> Message-ID: Well, at some point, it needs to break :) I was for a long time maintainer of gnome modules, more specifically zenity and in order to move forward with some functionalities we had to break stuff. We could not keep legacy code and move forward with new functionalities, and the gnome strategy is pretty simple: minor version, you must maintain api compatibility. Major version, let's break everything! The user can either stay in the version X.y.z, or update their code to version X+1.y.z. That's exactly what happened when gnome/gtk released the 3.x version, and what will happen with the future 4.x version. So, it's very hard to try new things, when you must maintain forever old things. The naming is for some people a problem, and we should make an effort to change that. Sometimes we don't see this as an issue, because it is so deeply rooted in our lives, that we don't see it as a problem. I'll give you an example we have in Brazil: One of the biggest children authors, known as Monteiro Lobato [1], was a very racist person, and he put all his racism in books, the books we have to read at school. So, in one of his famous books he has this character called Tia Anastácia, and another one the smart one called Pedrinho. So, Pedrinho always calls Tia Anastácia as: "That black lady" or: She is as black as a Gorilla, and people thought this was fine, and funny. And it was an official lecture in schools in Brazil, and even had a TV Show about it. I was one of those who watched and read those books, and always thought this was OKAY. Today, my daughter will never read Monteiro Lobato, and hopefully she will understand that is wrong if people call you "black as a Gorilla", no matter the context. Now, imagine you grow up reading these stories, how would you feel? ;) This is also right in code, you might not care, but there are people who are very sensible to some naming convention. Master/Slave may sound uncomfortable. Specially for people who have 400 years of slavery in their history. As an open source community, we should be able to fight against this, and make it a good code and environment for people who are new, and want to contribute, but not feel comfortable with some naming convention. You might say there's no such thing, but trust me they exist, and we should be working to make these people comfortable and welcome to our community. It's not about breaking code, it's about fixing it :) 1 - https://en.wikipedia.org/wiki/Monteiro_Lobato Kind regards, On Thu, Jul 9, 2020 at 7:06 PM Ghanshyam Mann wrote: > ---- On Thu, 09 Jul 2020 11:45:19 -0500 Arx Cruz > wrote ---- > > Yes, that's the idea. > > We can keep the old interface for a few cycles, with warning > deprecation message advertising to use the new one, and then remove in the > future. > > Deprecating things leads to two situations which really need some good > reason before doing it: > > - If we keep the deprecated interfaces working along with new interfaces > then it is confusion for users > as well as maintenance effort. In my experience, very less migration > happen to new things if old keep working. > > - If we remove them in future then it is breaking change. > > IMO, we need to first ask/analyse whether name changes are worth to do > with above things as results. Or in other > team we should first define what is 'outdated naming conventions' and how > worth to fix those. > > -gmann > > > > Kind regards, > > > > On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano > wrote: > > > > > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > > @RedHat Red > Hat Red Hat > > > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz > wrote > > > ---- > > > > Hello, > > > > I would like to start a discussion regarding the topic. > > > > At this moment in time we have an opportunity to be a more open and > > > > inclusive project by eliminating outdated naming conventions from > > > > tempest codebase, such as blacklist, whitelist.We should take the > > > > opportunity and do our best to replace outdated terms with their > more > > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > > already working on this initiative, and I would like to work on > this as > > > > well on the tempest side. > > > Thanks Arx for raising it. > > > > > > I always have hard time to understand the definition of 'outdated > naming > > > conventions ' are they outdated from coding language perspective or > > > outdated as English language perspective? I do not see naming used in > > > coding language should be matched with English as grammar/outdated/new > > > style language. As long as they are not so bad (hurt anyone culture, > > > abusing word etc) it is fine to keep them as it is and start adopting > new > > > names for new things we code. > > > > > > For me, naming convention are the things which always can be improved > over > > > time, none of the name is best suited for everyone in open source. > But we > > > need to understand whether it is worth to do in term of 1. effort of > > > changing those 2. un- comfortness of adopting new names 3. again > changing > > > in future. > > > > > > At least from Tempest perspective, blacklist is very known common > word used > > > for lot of interfaces and dependent testing tool. I cannot debate on > how > > > good it is or bad but i can debate on not-worth to change now. For new > > > interface, we can always use best-suggested name as per that > > > time/culture/maintainers. We have tried few of such improvement in > past but > > > end up not-successful. Example: - > > > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > > 7f493b6086cab9/tempest/test.py#L43 > > > > > > > That's not the only used terminology for list of things, though. We > could > > always add new interfaces and keep the old ones are deprecated (but not > > advertised) for the foreseable future. The old code won't be broken and > the > > new one would use the new terminology, I'd say it's a good solution. > > > > > > -- > > Luigi > > > > > > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From joh.scheuer at gmail.com Thu Jul 9 15:35:57 2020 From: joh.scheuer at gmail.com (Johannes Scheuermann) Date: Thu, 9 Jul 2020 17:35:57 +0200 Subject: Neutron Agent Migration Message-ID: Hi together, currently we exploring how we can reboot a compute node without any interruptions for the networking stack. We run Openstack Train with ml2 driver Linux bridge and dnsmasq for DHCP and internal DNS. The DHCP setup runs as high availability setup with 3 replicas. During our tests we identified the following challenges: 1.) If we reboot the machine without doing anything on the network layer all ports will be rescheduled. Also the networks will be removed from the (dead) agent and will be reassigned to another agent. But for each reboot we have some leftover ports with the device-id "reserved_dhcp_port". These ports can safely deleted (we haven't figured out where the issue in the neutron code is). 2.) If we disable the network agent like described here: https://docs.openstack.org/neutron/train/admin/config-dhcp-ha.html and then remove the disabled agent from all networks we have an even worse behaviour since the neutron scheduler doesn't reschedule the network to a different agent. So what is the correct way to ensure that the reboot of a node has no (or only small) interruptions to the networking service? The current issue is that if we remove one agent we might remove the port that is the first entry in the clients (VM's) resolv.conf which means that each request will be delayed by the default timeout. And is there any option to "migrate" a network from one agent to another? Thanks in advance, Johannes Scheuermann From zhangbailin at inspur.com Fri Jul 10 02:11:00 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Fri, 10 Jul 2020 02:11:00 +0000 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver Message-ID: Hi all: This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as follows: 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From yumeng_bao at yahoo.com Fri Jul 10 05:37:04 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Fri, 10 Jul 2020 13:37:04 +0800 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> Message-ID: <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> Brin, thanks for bringing this up! > Hi all: > This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as follows: > 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; > 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. Providing test result can be our option. The test result can be part of the driver documentation[0] as this is public to users. And from my understanding, the test result should work as the role of tempest case and clarify at least: necessary configuration,test operations and test results. [0] https://docs.openstack.org/cyborg/latest/reference/support-matrix.html#driver-support > [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html Regards, Yumeng From gouthampravi at gmail.com Fri Jul 10 05:37:35 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 9 Jul 2020 22:37:35 -0700 Subject: [manila][stable] moving some stable branches to EOL Message-ID: Hello Stackers, There have been no changes to the stable/ocata [1], driverfixes/mitaka [2], driverfixes/newton [3] and driverfixes/ocata [4] branches of openstack/manila in a year [1] and the manila team today decided [2] that it was time to close this branches. While we routinely get requests from users, vendors and distributions to backport bug fixes to older releases, no one seems to want any further changes in these branches. We'd also like stable/pike to be EOL'ed, the last change to that branch was a CVE fix made three months ago. Keeping these branches open may give the impression that we'd continue to take backports in, and support them with bugfixes, when the reality is that we're struggling to keep meaningful testing in stable/queens and stable/rocky branches - something we've seen most bugfix/backport requests for. If there are no objections, I'll propose an EOL patch and request the infra team to help delete these branches. Thanks, Goutham [1] https://opendev.org/openstack/manila/commits/branch/stable/ocata [2] https://opendev.org/openstack/manila/src/branch/driverfixes/mitaka [3] https://opendev.org/openstack/manila/src/branch/driverfixes/newton [4] https://opendev.org/openstack/manila/src/branch/driverfixes/ocata [2] http://eavesdrop.openstack.org/meetings/manila/2020/manila.2020-07-09-15.01.log.html#l-80 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Jul 10 07:18:18 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 10 Jul 2020 09:18:18 +0200 Subject: Neutron Agent Migration In-Reply-To: References: Message-ID: <06314D7F-366B-4E64-95E7-4F979D315512@redhat.com> Hi, > On 9 Jul 2020, at 17:35, Johannes Scheuermann wrote: > > Hi together, > > currently we exploring how we can reboot a compute node without any interruptions for the networking stack. > We run Openstack Train with ml2 driver Linux bridge and dnsmasq for DHCP and internal DNS. > The DHCP setup runs as high availability setup with 3 replicas. > During our tests we identified the following challenges: > > 1.) > > If we reboot the machine without doing anything on the network layer all ports will be rescheduled. > Also the networks will be removed from the (dead) agent and will be reassigned to another agent. > But for each reboot we have some leftover ports with the device-id "reserved_dhcp_port". > These ports can safely deleted (we haven't figured out where the issue in the neutron code is). It’s done here https://opendev.org/openstack/neutron/src/branch/master/neutron/db/agentschedulers_db.py#L419 and it’s done by purpose. The issue may be that this reserved port should be used on the new agent so we should check why it isn’t and why new port is created for new agent. > > 2.) > > If we disable the network agent like described here: https://docs.openstack.org/neutron/train/admin/config-dhcp-ha.html > and then remove the disabled agent from all networks we have an even worse behaviour since the neutron scheduler doesn't reschedule the network to a different agent. > > So what is the correct way to ensure that the reboot of a node has no (or only small) interruptions to the networking service? > The current issue is that if we remove one agent we might remove the port that is the first entry in the clients (VM's) resolv.conf which means that each request will be delayed by the default timeout. > > And is there any option to "migrate" a network from one agent to another? You can manually remove networks from one agent with command like: $ neutron dhcp-agent-network-remove And then add it to the new one with: $ neutron dhcp-agent-network-add > > Thanks in advance, > > Johannes Scheuermann > > — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Fri Jul 10 08:16:12 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 10 Jul 2020 10:16:12 +0200 Subject: [neutron] Team meeting - day change proposal Message-ID: Hi, During our last Monday team meeting I proposed to cancel this bi-weekly Monday meeting and have weekly meeting on Tuesday always. It is officially proposed in https://review.opendev.org/#/c/739780/ - if You usually attends those meetings and You didn’t check it yet, please do and give +1 if You are ok with such change. Thx in advance. — Slawek Kaplonski Principal software engineer Red Hat From tonyppe at gmail.com Fri Jul 10 08:18:53 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Fri, 10 Jul 2020 16:18:53 +0800 Subject: [magnum] failed to launch Kubernetes cluster Message-ID: Hi team, I hope you are all keeping safe and well at the moment. I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ I have the same log messages on the master node within heat. When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. On the master, after the failure the heat log repeats these logs: ++ curl --silent http://127.0.0.1:8080/healthz + '[' ok = ok ']' + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable Trying to label master node with node-role.kubernetes.io/master="" + echo 'Trying to label master node with node-role.kubernetes.io/master=""' + sleep 5s [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? To create the cluster template I have been using: openstack coe cluster template create k8s-cluster-template \ --image Fedora-Atomic-27 \ --keypair testpair \ --external-network physnet2vlan20 \ --dns-nameserver 192.168.7.233 \ --flavor 2GB-2vCPU \ --docker-volume-size 15 \ --network-driver flannel \ --coe kubernetes If I have missed anything, I am happy to provide it. Many thanks in advance for any help or pointers on this. Regards, Tony Pearce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Fri Jul 10 08:24:34 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Fri, 10 Jul 2020 09:24:34 +0100 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: References: Message-ID: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Hi Tony That is a known issue and is due to the default version of heat container agent baked into Train release. Please use label heat_container_agent_tag=train-stable-3 and you should be good to go. Cheers Bharat > On 10 Jul 2020, at 09:18, Tony Pearce wrote: > > Hi team, I hope you are all keeping safe and well at the moment. > > I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. > > I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. > > This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. > Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: > > TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml > > TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. > > When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): > https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ > > I have the same log messages on the master node within heat. > > When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. > > When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ > > I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. > Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. > > On the master, after the failure the heat log repeats these logs: > > ++ curl --silent http://127.0.0.1:8080/healthz > + '[' ok = ok ']' > + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master ": ""}}}' > error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable > Trying to label master node with node-role.kubernetes.io/master= "" > + echo 'Trying to label master node with node-role.kubernetes.io/master= ""' > + sleep 5s > > [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ > [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ > > May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? > > To create the cluster template I have been using: > openstack coe cluster template create k8s-cluster-template \ > --image Fedora-Atomic-27 \ > --keypair testpair \ > --external-network physnet2vlan20 \ > --dns-nameserver 192.168.7.233 \ > --flavor 2GB-2vCPU \ > --docker-volume-size 15 \ > --network-driver flannel \ > --coe kubernetes > > > If I have missed anything, I am happy to provide it. > > Many thanks in advance for any help or pointers on this. > > Regards, > > Tony Pearce > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Fri Jul 10 12:32:24 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Fri, 10 Jul 2020 07:32:24 -0500 Subject: [tc] [all] Topics for Cross Community Discussion with Kubernetes ... Message-ID: All, Recently, the OpenStack TC has reached out to the Kubernetes Steering Committee for input as we have proposed adding a starter-kit:kubernetes-in-virt tag for projects in OpenStack. This request was received positively and as a result the TC has started brainstorming other topics that we could approach with the k8s community in this [1] etherpad. If you have topics that may be appropriate for this discussion please see the etherpad and add your ideas. Thanks! Jay IRC: jungleboyj [1] https://etherpad.opendev.org/p/kubernetes-cross-community-topics From smooney at redhat.com Fri Jul 10 12:58:15 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 10 Jul 2020 13:58:15 +0100 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver In-Reply-To: <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> Message-ID: <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) > > in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; > > 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the > > designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation > > of the commit. > > Providing test result can be our option. The test result can be part of the driver documentation[0] as this is public > to users. > And from my understanding, the test result should work as the role of tempest case and clarify at least: necessary > configuration,test operations and test results. i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] https://docs.openstack.org/cyborg/latest/reference/support-matrix.html#driver-support > > > > [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From ionut at fleio.com Fri Jul 10 13:50:32 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 10 Jul 2020 16:50:32 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi again, I did not manage to make it work, I cannot figure out how to connect all the pieces. pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ pipeline.yaml https://paste.xinu.at/u1E42/ polling.yaml https://paste.xinu.at/MZWNs/ gnocchi_resources.yaml https://paste.xinu.at/j3AX/ gnocchi_client.py in resources_update_operations https://paste.xinu.at/no5/ gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ Do you mind if you do a full example using "dynamic.network.services.vpn.connection" from https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html ? Or maybe you can point me to the mistakes made in my configuration? On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > That is the right direction. I don't know why people hard-coded the > initial pollsters' configs and did not document the relation between > Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a > single system, but interdependent systems to implement a monitoring > solution. Ceilometer is the component that gathers data/information, > processes, and then persists it somewhere. Gnocchi is one of the options > that Ceilometer can use to persist data. By default, Ceilometer creates > some basic configurations in Gnocchi to store data, such as some default > resource-types with default attributes. However, we do not need (should > not) rely on this default config. > > You can create and use custom resources to fit the stack to your needs. > This can be achieved via `gnocchi resource-type create -a > :: ` and > `gnocchi resource-type create -u > :: `. > Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), > you can customize the mapping of metrics to resource-types in Gnocchi. > > On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: > >> Hello again, >> >> What's the proper way to handle dynamic pollsters in gnocchi ? >> Right now ceilometer returns: >> >> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >> is not handled by Gnocchi >> >> I found >> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >> but I'm not sure if is the right direction. >> >> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >> >>> Seems to work fine now. Thanks. >>> >>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> It looks like a coding error that we left behind during a major >>>> refactoring that we introduced upstream. >>>> I created a patch for it. Can you check/review and test it? >>>> https://review.opendev.org/739555 >>>> >>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>> >>>>> Hi Rafael, >>>>> >>>>> I have an error and I cannot resolve it myself. >>>>> >>>>> https://paste.xinu.at/LEfdXD/ >>>>> >>>>> Do you happen to know what's wrong? >>>>> >>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>> >>>>> >>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> Good catch. I fixed the docs. >>>>>> https://review.opendev.org/#/c/739288/ >>>>>> >>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I just noticed that the example >>>>>>> dynamic.network.services.vpn.connection from >>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>> the wrong indentation. >>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>> >>>>>>> Now I have to see why is not polling from it >>>>>>> >>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hi Rafael, >>>>>>>> >>>>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>>>> octavia dynamic poller but I have couples of errors. >>>>>>>> >>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>> Error is about syntax error near name: >>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>> >>>>>>>> if i remove the - in front of name like this: >>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>> >>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>> and those will be available for victoria? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>>> >>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>> production? >>>>>>>>>> >>>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>>> have a few openstack components that are customized with the >>>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>>> not using the community version, but something in-between (the community >>>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>>> quality for production. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>>>> >>>>>>>>>> Hello Rafael, >>>>>>>>>> >>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>> and those will be available for victoria? >>>>>>>>>> >>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>> production? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>> that might be important for your use case: >>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>>> returns the full load balancer status tree [1]. >>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for statistics >>>>>>>>>>>>>> [2][3][4]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>> [2] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Rafael Weingärtner >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Fri Jul 10 14:01:26 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 10 Jul 2020 16:01:26 +0200 Subject: [octavia] Replace broken amphoras Message-ID: Hi, we had some network issues and now have amphoras which are marked in ERROR state. What we already tried: - failover the amphora - failover the loadbalancer both did not work, got "unable to attach port to (new) amphora". Then we removed the vrrp_port, set the vrrp_port_id to NULL and repeated the amphora failover Reverting Err: "PortID: Null" Then we created a new vrrp_port as described [1] and added the port-id to the vrrp_port_id and the a suitable vrrp_ip field to our ERRORed amphora entry. Restarted failover -> without luck. Currently we have an single STANDALONE amphora configured. Is there a way to trigger octavia to create new "clean" amphoras for MASTER/BACKUP? Thanks, Fabian [1] http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Jul 10 14:03:10 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 10 Jul 2020 09:03:10 -0500 Subject: [release] Release countdown for week R-13 July 13 - July 17 Message-ID: <20200710140310.GA2336490@sm-workstation> Development Focus ----------------- The Victoria-2 milestone will happen in a few weeks, on July 30. Victoria-related specs should now be finalized so that teams can move to implementation ASAP. Some teams observe specific deadlines on the second milestone (mostly spec freezes): please refer to https://releases.openstack.org/victoria/schedule.html for details. General Information ------------------- Please remember that libraries need to be released at least once per milestone period. At milestone 2, the release team will propose releases for any library that has not been otherwise released since milestone 1. Other non-library deliverables that follow the cycle-with-intermediary release model should have an intermediary release before milestone-2. Those who haven't will be proposed to switch to the cycle-with-rc model, which is more suited to deliverables that are released only once per cycle. At milestone-2 we also freeze the contents of the final release. If you have a new deliverable that should be included in the final release, you should make sure it has a deliverable file in: https://opendev.org/openstack/releases/src/branch/master/deliverables/victoria You should request a beta release (or intermediary release) for those new deliverables by milestone-2. We understand some may not be quite ready for a full release yet, but if you have something minimally viable to get released it would be good to do a 0.x release to exercise the release tooling for your deliverables. See the MembershipFreeze description for more details: https://releases.openstack.org/victoria/schedule.html#v-mf Finally, now may be a good time for teams to check on any stable releases that need to be done for your deliverables. If you have bugfixes that have been backported, but no stable release getting those. If you are unsure what is out there committed but not released, in the openstack/releases repo, running the command "tools/list_stable_unreleased_changes.sh " gives a nice report. Upcoming Deadlines & Dates -------------------------- Victoria-2 milestone: July 30 From rosmaita.fossdev at gmail.com Fri Jul 10 16:11:03 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 10 Jul 2020 12:11:03 -0400 Subject: [cinder] monthly video meeting poll results Message-ID: tl;dr - our first video meeting will be Wednesday 29 July connection info will be on the agenda etherpad, https://etherpad.opendev.org/p/cinder-victoria-meetings For those who didn't see the poll, this is what it was about: We're considering holding the Cinder weekly meeting as a video conference once each month. It will be the last meeting of each month and will take place at the regularly scheduled meeting time (1400 UTC for 60 minutes). Video Meeting Rules: * Everyone will keep IRC open during the meeting. * We'll take notes in IRC to leave a record similar to what we have for our regular IRC meetings. * Some people are more comfortable communicating in written English. So at any point, any attendee may request that the discussion of the current topic be conducted entirely in IRC. The results: Do it? - 50% in favor, 33% in strong favor, 17% don't care, no one opposed. Record? - 50% yes, 50% don't care Conferencing software? - Bluejeans: first choice of 70% of respondents Comments - Let's work hard to write what we speak! - people who don't want to be recorded can turn their camera off - video conference plus IRC is for sure better than IRC only - Zoom is shady and possibly not appropriate for an open source project that wants to welcome contributors from all countries. I think we're better off avoiding it. Conclusion: We'll hold the Cinder weekly meeting for 29 July in BlueJeans *and* IRC following the ground rules laid out above, and continue doing the same for the last meeting of each month through the end of the Victoria cycle. The meetings will be recorded. From pramchan at yahoo.com Fri Jul 10 16:55:08 2020 From: pramchan at yahoo.com (prakash RAMCHANDRAN) Date: Fri, 10 Jul 2020 16:55:08 +0000 (UTC) Subject: [all][InteropWG] weekly Friday call - request for Partciapation References: <1093529208.5519657.1594400108326.ref@mail.yahoo.com> Message-ID: <1093529208.5519657.1594400108326@mail.yahoo.com> Hi all, We have the agenda listed on https://etherpad.opendev.org/p/interop Call is now moved to meetpad and will be easy to acess and try it out today in next 10-15 minutes if you find time and interest. OpenStack  / Open Infra -interopWGInterop Working Group - Weekly Friday 10-11 AM or UTC 17-18 Lnink: https://meetpad.opendev.org/Interop-WG-weekly-meeting ThanksPrakash -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Jul 10 17:29:41 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 10 Jul 2020 18:29:41 +0100 Subject: [kolla] PTL holiday Message-ID: Hi, I will be on holiday next Tuesday (14th) to Thursday (16th). I will therefore miss both the IRC meeting and Kolla Klub. If someone is able to chair the IRC meeting, please reply here. There is currently nothing on the agenda for the Klub, so anyone looking to chair that meeting will also need to find some topics to cover. We have suggestions in [1]. Cheers, Mark [1] https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw From anilj.mailing at gmail.com Fri Jul 10 17:44:35 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 10 Jul 2020 10:44:35 -0700 Subject: OpenStack cluster event notification In-Reply-To: <59G5DQ.5J8F1FJXF7IT3@est.tech> References: <59G5DQ.5J8F1FJXF7IT3@est.tech> Message-ID: Hello Gibi. I looked at your sample code. Other than providing the username password of the user in transport url, *transport = oslo_messaging.get_notification_transport(cfg.CONF, url='rabbit://stackrabbit:admin at 100.109.0.10:5672/ ')* What changes are to be done in the nova.conf file? Can you please provide the exact set of changes? /anil. On Wed, Jul 8, 2020 at 5:05 AM Balázs Gibizer wrote: > > > On Tue, Jul 7, 2020 at 16:32, Julia Kreger > wrote: > [snip] > > > > > Although that being said, I don't think much would really prevent you > > from consuming the notifications directly from the message bus, if you > > so desire. Maybe someone already has some code for this on hand. > > Here is some example code that forwards the nova versioned > notifications from the message bus out to a client via websocket [1]. I > used this sample code in my demo [2] during a summit presentation. > > Cheers, > gibi > > [1] > > https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py > [2] https://www.youtube.com/watch?v=WFq5JWXa9AM > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Fri Jul 10 17:24:17 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 10 Jul 2020 14:24:17 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Sure, this is a minimalistic config I used for testing (watch for the indentation issues that might happen due to copy/paste into Gmail). > cat ceilometer/pollsters.d/vpn-connection-dynamic-pollster.yaml > --- > > - name: "dynamic_pollster.network.services.vpn.connection" > sample_type: "gauge" > unit: "ipsec_site_connection" > value_attribute: "status" > endpoint_type: "network" > url_path: "v2.0/vpn/ipsec-site-connections" > metadata_fields: > - "name" > - "vpnservice_id" > - "description" > - "status" > - "peer_address" > value_mapping: > ACTIVE: "1" > DOWN: "0" > metadata_mapping: > name: "display_name" > default_value: 0 > Then, the polling.yaml file cat ceilometer/polling.yaml | grep -A 3 vpnass > - name: vpnass_pollsters > interval: 600 > meters: > - dynamic_pollster.network.services.vpn.connection > And last, but not least, the custom_gnocchi_resources file. > cat ceilometer/custom_gnocchi_resources.yaml | grep -B 2 -A 9 > "dynamic_pollster.network.services.vpn.connection" > - resource_type: s2svpn > metrics: > dynamic_pollster.network.services.vpn.connection: > attributes: > name: resource_metadata.name > vpnservice_id: resource_metadata.vpnservice_id > description: resource_metadata.description > status: resource_metadata.status > peer_address: resource_metadata.peer_address > display_name: resource_metadata.display_name > Bear in mind that you need to create the Gnocchi resource type. > gnocchi resource-type show s2svpn > > +--------------------------+-----------------------------------------------------------+ > | Field | Value > | > > +--------------------------+-----------------------------------------------------------+ > | attributes/description | max_length=255, min_length=0, required=False, > type=string | > | attributes/display_name | max_length=255, min_length=0, required=False, > type=string | > | attributes/name | max_length=255, min_length=0, required=False, > type=string | > | attributes/peer_address | max_length=255, min_length=0, required=False, > type=string | > | attributes/status | max_length=255, min_length=0, required=False, > type=string | > | attributes/vpnservice_id | required=False, type=uuid > | > | name | s2svpn > | > | state | active > | > > +--------------------------+-----------------------------------------------------------+ > What is the problem you are having? On Fri, Jul 10, 2020 at 10:50 AM Ionut Biru wrote: > Hi again, > > I did not manage to make it work, I cannot figure out how to connect all > the pieces. > > pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ > pipeline.yaml https://paste.xinu.at/u1E42/ > polling.yaml https://paste.xinu.at/MZWNs/ > gnocchi_resources.yaml https://paste.xinu.at/j3AX/ > gnocchi_client.py in resources_update_operations > https://paste.xinu.at/no5/ > gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ > Do you mind if you do a full example > using "dynamic.network.services.vpn.connection" from > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html > ? > > Or maybe you can point me to the mistakes made in my configuration? > > > On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> That is the right direction. I don't know why people hard-coded the >> initial pollsters' configs and did not document the relation between >> Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a >> single system, but interdependent systems to implement a monitoring >> solution. Ceilometer is the component that gathers data/information, >> processes, and then persists it somewhere. Gnocchi is one of the options >> that Ceilometer can use to persist data. By default, Ceilometer creates >> some basic configurations in Gnocchi to store data, such as some default >> resource-types with default attributes. However, we do not need (should >> not) rely on this default config. >> >> You can create and use custom resources to fit the stack to your needs. >> This can be achieved via `gnocchi resource-type create -a >> :: ` and >> `gnocchi resource-type create -u >> :: `. >> Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), >> you can customize the mapping of metrics to resource-types in Gnocchi. >> >> On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: >> >>> Hello again, >>> >>> What's the proper way to handle dynamic pollsters in gnocchi ? >>> Right now ceilometer returns: >>> >>> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >>> is not handled by Gnocchi >>> >>> I found >>> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >>> but I'm not sure if is the right direction. >>> >>> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >>> >>>> Seems to work fine now. Thanks. >>>> >>>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> It looks like a coding error that we left behind during a major >>>>> refactoring that we introduced upstream. >>>>> I created a patch for it. Can you check/review and test it? >>>>> https://review.opendev.org/739555 >>>>> >>>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>>> >>>>>> Hi Rafael, >>>>>> >>>>>> I have an error and I cannot resolve it myself. >>>>>> >>>>>> https://paste.xinu.at/LEfdXD/ >>>>>> >>>>>> Do you happen to know what's wrong? >>>>>> >>>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>>> >>>>>> >>>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> Good catch. I fixed the docs. >>>>>>> https://review.opendev.org/#/c/739288/ >>>>>>> >>>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I just noticed that the example >>>>>>>> dynamic.network.services.vpn.connection from >>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>>> the wrong indentation. >>>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>>> >>>>>>>> Now I have to see why is not polling from it >>>>>>>> >>>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hi Rafael, >>>>>>>>> >>>>>>>>> I think I applied all the reviews successfully but I tried to do >>>>>>>>> an octavia dynamic poller but I have couples of errors. >>>>>>>>> >>>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>>> Error is about syntax error near name: >>>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>>> >>>>>>>>> if i remove the - in front of name like this: >>>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>>> >>>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>>>> >>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>> production? >>>>>>>>>>> >>>>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>>>> have a few openstack components that are customized with the >>>>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>>>> not using the community version, but something in-between (the community >>>>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>>>> quality for production. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello Rafael, >>>>>>>>>>> >>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>> >>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>> production? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>>> that might be important for your use case: >>>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>>>> returns the full load balancer status tree [1]. >>>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for >>>>>>>>>>>>>>> statistics [2][3][4]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Rafael Weingärtner >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Jul 10 19:55:19 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 10 Jul 2020 21:55:19 +0200 Subject: [blazar] IRC meetings cancelled next week Message-ID: Hello, As I will be on holiday next week (July 13-17), I have proposed that both IRC meetings are cancelled. We will meet again on Tuesday July 21 for EMEA and Thursday July 30 for Americas. Cheers, Pierre (priteau) From paye600 at gmail.com Fri Jul 10 20:23:45 2020 From: paye600 at gmail.com (Roman Gorshunov) Date: Fri, 10 Jul 2020 22:23:45 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> References: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> Message-ID: Hello Corne, The loci images are now updated [0]. Thanks to Andrii Ostapenko and reviewers. [0] https://hub.docker.com/u/loci Best regards, Roman Gorshunov On Thu, Jul 2, 2020 at 12:35 PM Roman Gorshunov wrote: > > Hello Corne, > > Thank you for your email. i have investigated the issue, and seems that we have image push broken for some time. > While we work on resolution, I could advice you to locally build images, if that suits you. > > I would post a reply here to the mailing list once issue is resolved. > Again, thank you for paying attention and informing us. > > Best regards, > Roman Gorshunov > From zhangbailin at inspur.com Sat Jul 11 01:41:41 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Sat, 11 Jul 2020 01:41:41 +0000 Subject: =?utf-8?B?562U5aSNOiBbY3lib3JnXSBUZW1wb3JhcnkgdHJlYXRtZW50IHBsYW4gZm9y?= =?utf-8?Q?_the_3rd-party_driver?= In-Reply-To: <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> Message-ID: On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers > > (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, > > we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, > > this method is recommended; 2. If there is no CI environment, please > > provide the test results of this driver in the master branch or in > > the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. > > Providing test result can be our option. The test result can be part > of the driver documentation[0] as this is public to users. > And from my understanding, the test result should work as the role of > tempest case and clarify at least: necessary configuration,test operations and test results. > i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. Good advice, this is also my original intention. Give the result verification in the submitted commit, and do not put the test verification result in the code base. As you said, this does not mean that it will always work unless a test report can be provided regularly. Of course, it is better if there is a third-party CI , we will try our best to fight for it. > maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. > the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] > https://docs.openstack.org/cyborg/latest/reference/support-matrix.html > #driver-support > > > > [1] > > http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openst > > ack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From radoslaw.piliszek at gmail.com Sat Jul 11 10:40:10 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 11 Jul 2020 12:40:10 +0200 Subject: [kolla] PTL holiday In-Reply-To: References: Message-ID: On Fri, Jul 10, 2020 at 7:39 PM Mark Goddard wrote: > > Hi, > > I will be on holiday next Tuesday (14th) to Thursday (16th). I will > therefore miss both the IRC meeting and Kolla Klub. If someone is able > to chair the IRC meeting, please reply here. There is currently > nothing on the agenda for the Klub, so anyone looking to chair that > meeting will also need to find some topics to cover. We have > suggestions in [1]. I agree to chair them both. For Klub I suggest we run an open discussion panel. There is usually something to talk about but the formal agenda might sound scary. :-) -yoctozepto From reza.b2008 at gmail.com Sat Jul 11 12:55:36 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Sat, 11 Jul 2020 17:25:36 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I found following error in ironic and container-puppet-ironic container log during installation: puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: > Hi, > > I'm going to install OpenStack Train with the help of TripleO on CentOS 8, > but undercloud installation fails with the following error: > > "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", > "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ > exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- > Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 > -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: > 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 > 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 > 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} > > Any suggestion would be grateful. > Regards, > Reza > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Sat Jul 11 16:44:48 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Sat, 11 Jul 2020 11:44:48 -0500 Subject: [cinder] monthly video meeting poll results In-Reply-To: References: Message-ID: Brian, Thanks for putting this together and for the summary. Look forward to seeing you all later this month.  :-) Jay On 7/10/2020 11:11 AM, Brian Rosmaita wrote: > tl;dr - our first video meeting will be Wednesday 29 July >   connection info will be on the agenda etherpad, >   https://etherpad.opendev.org/p/cinder-victoria-meetings > > For those who didn't see the poll, this is what it was about: > > We're considering holding the Cinder weekly meeting as a video > conference once each month. It will be the last meeting of each month > and will take place at the regularly scheduled meeting time (1400 UTC > for 60 minutes). > > Video Meeting Rules: > * Everyone will keep IRC open during the meeting. > * We'll take notes in IRC to leave a record similar to what we have > for our regular IRC meetings. > * Some people are more comfortable communicating in written English. > So at any point, any attendee may request that the discussion of the > current topic be conducted entirely in IRC. > > The results: > Do it? > - 50% in favor, 33% in strong favor, 17% don't care, no one opposed. > Record? > - 50% yes, 50% don't care > Conferencing software? > - Bluejeans: first choice of 70% of respondents > Comments > - Let's work hard to write what we speak! > - people who don't want to be recorded can turn their camera off > - video conference plus IRC is for sure better than IRC only > - Zoom is shady and possibly not appropriate for an open source > project that wants to welcome contributors from all countries. I think > we're better off avoiding it. > > Conclusion: > We'll hold the Cinder weekly meeting for 29 July in BlueJeans *and* > IRC following the ground rules laid out above, and continue doing the > same for the last meeting of each month through the end of the > Victoria cycle.  The meetings will be recorded. > From aschultz at redhat.com Sun Jul 12 21:09:45 2020 From: aschultz at redhat.com (Alex Schultz) Date: Sun, 12 Jul 2020 15:09:45 -0600 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I don't believe centos8 containers are available for Train yet. The error you're hitting is because it's fetching centos7 containers and the ironic container is not backwards compatible between the two versions. If you want centos8, use Ussuri. On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi wrote: > > I found following error in ironic and container-puppet-ironic container log during installation: > > puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 > > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: >> >> Hi, >> >> I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: >> >> "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} >> >> Any suggestion would be grateful. >> Regards, >> Reza >> >> From marios at redhat.com Mon Jul 13 06:20:14 2020 From: marios at redhat.com (Marios Andreou) Date: Mon, 13 Jul 2020 09:20:14 +0300 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: Hi folks, On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: > I don't believe centos8 containers are available for Train yet. The > error you're hitting is because it's fetching centos7 containers and > the ironic container is not backwards compatible between the two > versions. If you want centos8, use Ussuri. > > fyi we started pushing centos8 train last week - slightly different namespace - latest current-tripleo containers are pushed to https://hub.docker.com/u/tripleotraincentos8 hope it helps > On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi > wrote: > > > > I found following error in ironic and container-puppet-ironic container > log during installation: > > > > puppet-user: Error: > /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: > Could not evaluate: Could not retrieve information from environment > production source(s) file:/tftpboot/ldlinux.c32 > > > > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi > wrote: > >> > >> Hi, > >> > >> I'm going to install OpenStack Train with the help of TripleO on CentOS > 8, but undercloud installation fails with the following error: > >> > >> "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ > '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit > 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying > running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed > running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- > Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 > ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: > 95117 -- ERROR configuring zaqar"]} > >> > >> Any suggestion would be grateful. > >> Regards, > >> Reza > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyppe at gmail.com Mon Jul 13 07:43:51 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Mon, 13 Jul 2020 15:43:51 +0800 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Message-ID: Hi Bharat, many thanks for your super quick response to me last week. I really appreciate that, especially since I had been trying for so long on this issue here. I wanted to try out your suggestion before coming back and creating a reply. I tried your suggestion and at first, I got the same experience (failure) when creating a cluster. It appeared to stop in the same place as I described in the mail previous. I noticed some weird things with DNS integration (Designate) during the investigation [1] and [2]. I decided to remove Designate from Openstack and retest and now I am successfully able to deploy a kubernetes cluster! :) Regarding those 2 points: [1] - the configured designate zone was project.cloud.company.com and instance1 would be instance1.project.cloud.company.com however, the kube master instance hostname was getting master.cloud.company.com [2] - when doing a dns lookup on master.project.cloud.company.com the private IP was being returned instead of the floating IP. This meant that from outside the project, the instance couldnt be pinged by hostname. I've removed both magnum and Designate and then redeployed both by first deploying Magnum and testing successful kubernetes cluster deployment using your fix Bharat. Then I deployed Designate again. Issue [1] is still present while issue [2] is resolved and no longer present. Kubernetes cluster deployment is still successful :) Thank you once again and have a great week ahead! Kind regards, Tony Pearce On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar wrote: > Hi Tony > > That is a known issue and is due to the default version of heat container > agent baked into Train release. Please use label > heat_container_agent_tag=train-stable-3 and you should be good to go. > > Cheers > > Bharat > > On 10 Jul 2020, at 09:18, Tony Pearce wrote: > > Hi team, I hope you are all keeping safe and well at the moment. > > I am trying to use magnum to launch a kubernetes cluster. I have tried > different images but currently using Fedora-Atomic 27. The cluster > deployment from the cluster template is failing and I am here to ask if you > could please point me in the right direction? I have become stuck and I am > uncertain how to further troubleshoot this. The cluster seems to fail a few > minutes after booting up the master node because after I see the logs > ([1],[2]), I do not see any progress in terms of new (different) logs or > load on the master. Then the 60-minute timeout is reached and fails the > cluster. > > I deployed this openstack stack using kayobe (kolla-ansible) and this is > version Train. This is deployed on CentOS 7 within docker containers. > Kayobe manages this deployment through the ansible playbooks. > > This was previously working some months back although I think I may have > used coreos image at that time, and that is also not working today. The > deployment would have been back around February 2020. I then deleted that > deployment and re-deployed. The only change being the hostname for > controller node as updated in the inventory file for the kayobe. > Since then which was a month or so back I've been unable to successfully > deploy a kubernetes cluster. I've tried other fedora-atomic images as well > as coreos without success. When using the coreos image and when tagging the > image with the coreos tag as per the magnum docs, the instance fails to > boot and goes to the rescue shell. However if I manually launch the coreos > image then it does successfully boot and get configured via cloud-init. All > of the deployment attempts stop at the same place when using fedora image > and I have a different experience if I disable TLS: > > TLS enabled: master launched, no nodes. Fails when > running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml > > TLS disabled: master and nodes launched but later fails. I > didnt investigate this very much. > > When looking for help around the web, I found this which looks to be the > same issue that I have at the moment (although he's deployed slightly > differently, using centos8 and mentions magnum 10): > > https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ > > > I have the same log messages on the master node within heat. > > When going through the troubleshooting guide I see that etcd is running > and no errors however I dont see any flannel service at all. But I also > don't know if this has simply failed before getting to deploy flannel or > whether flannel is the reason. I did try to deploy using a cluster template > that is using calico as a test but the same result from the logs. > > When looking at the stack via cli to see the failed stacks this is what I > see there: http://paste.openstack.org/show/795736/ > > I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and > 2GB memory. > Storage is only via cinder as I am using iscsi storage with a cinder > driver. I dont have any other storage. > > On the master, after the failure the heat log repeats these logs: > > ++ curl --silent http://127.0.0.1:8080/healthz > + '[' ok = ok ']' > + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch > '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' > error: no configuration has been provided, try setting KUBERNETES_MASTER > environment variable > Trying to label master node with node-role.kubernetes.io/master="" > + echo 'Trying to label master node with node-role.kubernetes.io/master= > ""' > + sleep 5s > > [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ > [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ > > May I ask if anyone has a recent deployment of Magnum and a working > deployment of kubernetes that could share with me the relevant details like > the image you have used so that I can try and replicate? > > To create the cluster template I have been using: > openstack coe cluster template create k8s-cluster-template \ > --image Fedora-Atomic-27 \ > --keypair testpair \ > --external-network physnet2vlan20 \ > --dns-nameserver 192.168.7.233 \ > --flavor 2GB-2vCPU \ > --docker-volume-size 15 \ > --network-driver flannel \ > --coe kubernetes > > > If I have missed anything, I am happy to provide it. > > Many thanks in advance for any help or pointers on this. > > Regards, > > Tony Pearce > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Mon Jul 13 08:35:39 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Mon, 13 Jul 2020 09:35:39 +0100 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Message-ID: <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> Hi Tony I have not used designate myself so not sure about the exact details but if you are using Kayobe/Kolla-Ansible, we recently proposed these backports to train, https://review.opendev.org/#/c/738882/1/ansible/roles/magnum/templates/magnum.conf.j2 . Magnum queries Keystone catalog for the url instances can use to talk back with Keystone and Magnum itself. Usually this is the public URL but essentially you need to specify an endpoint name which fits the bill. Please check /etc/kolla/magnum-conductor/magnum.conf in your control plane where Magnum is deployed and ensure it it configured to the correct interface. Cheers Bharat > On 13 Jul 2020, at 08:43, Tony Pearce wrote: > > Hi Bharat, many thanks for your super quick response to me last week. I really appreciate that, especially since I had been trying for so long on this issue here. I wanted to try out your suggestion before coming back and creating a reply. > > I tried your suggestion and at first, I got the same experience (failure) when creating a cluster. It appeared to stop in the same place as I described in the mail previous. I noticed some weird things with DNS integration (Designate) during the investigation [1] and [2]. I decided to remove Designate from Openstack and retest and now I am successfully able to deploy a kubernetes cluster! :) > > Regarding those 2 points: > [1] - the configured designate zone was project.cloud.company.com and instance1 would be instance1.project.cloud.company.com however, the kube master instance hostname was getting master.cloud.company.com > [2] - when doing a dns lookup on master.project.cloud.company.com the private IP was being returned instead of the floating IP. This meant that from outside the project, the instance couldnt be pinged by hostname. > > I've removed both magnum and Designate and then redeployed both by first deploying Magnum and testing successful kubernetes cluster deployment using your fix Bharat. Then I deployed Designate again. Issue [1] is still present while issue [2] is resolved and no longer present. Kubernetes cluster deployment is still successful :) > > Thank you once again and have a great week ahead! > > Kind regards, > > Tony Pearce > > > > On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar > wrote: > Hi Tony > > That is a known issue and is due to the default version of heat container agent baked into Train release. Please use label heat_container_agent_tag=train-stable-3 and you should be good to go. > > Cheers > > Bharat > >> On 10 Jul 2020, at 09:18, Tony Pearce > wrote: >> >> Hi team, I hope you are all keeping safe and well at the moment. >> >> I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. >> >> I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. >> >> This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. >> Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: >> >> TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml >> >> TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. >> >> When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): >> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ >> >> I have the same log messages on the master node within heat. >> >> When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. >> >> When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ >> >> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. >> Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. >> >> On the master, after the failure the heat log repeats these logs: >> >> ++ curl --silent http://127.0.0.1:8080/healthz >> + '[' ok = ok ']' >> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master ": ""}}}' >> error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable >> Trying to label master node with node-role.kubernetes.io/master= "" >> + echo 'Trying to label master node with node-role.kubernetes.io/master= ""' >> + sleep 5s >> >> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ >> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ >> >> May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? >> >> To create the cluster template I have been using: >> openstack coe cluster template create k8s-cluster-template \ >> --image Fedora-Atomic-27 \ >> --keypair testpair \ >> --external-network physnet2vlan20 \ >> --dns-nameserver 192.168.7.233 \ >> --flavor 2GB-2vCPU \ >> --docker-volume-size 15 \ >> --network-driver flannel \ >> --coe kubernetes >> >> >> If I have missed anything, I am happy to provide it. >> >> Many thanks in advance for any help or pointers on this. >> >> Regards, >> >> Tony Pearce >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyppe at gmail.com Mon Jul 13 08:41:44 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Mon, 13 Jul 2020 16:41:44 +0800 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> Message-ID: Hi Bharat, Thank you again :) Tony Pearce On Mon, 13 Jul 2020 at 16:35, Bharat Kunwar wrote: > Hi Tony > > I have not used designate myself so not sure about the exact details but > if you are using Kayobe/Kolla-Ansible, we recently proposed these backports > to train, > https://review.opendev.org/#/c/738882/1/ansible/roles/magnum/templates/magnum.conf.j2. > Magnum queries Keystone catalog for the url instances can use to talk back > with Keystone and Magnum itself. Usually this is the public URL but > essentially you need to specify an endpoint name which fits the bill. > Please check /etc/kolla/magnum-conductor/magnum.conf in your control plane > where Magnum is deployed and ensure it it configured to the correct > interface. > > > Cheers > > Bharat > > On 13 Jul 2020, at 08:43, Tony Pearce wrote: > > Hi Bharat, many thanks for your super quick response to me last week. I > really appreciate that, especially since I had been trying for so long on > this issue here. I wanted to try out your suggestion before coming back and > creating a reply. > > I tried your suggestion and at first, I got the same experience (failure) > when creating a cluster. It appeared to stop in the same place as I > described in the mail previous. I noticed some weird things with DNS > integration (Designate) during the investigation [1] and [2]. I decided to > remove Designate from Openstack and retest and now I am successfully able > to deploy a kubernetes cluster! :) > > Regarding those 2 points: > [1] - the configured designate zone was project.cloud.company.com and > instance1 would be instance1.project.cloud.company.com however, the kube > master instance hostname was getting master.cloud.company.com > [2] - when doing a dns lookup on master.project.cloud.company.com the > private IP was being returned instead of the floating IP. This meant that > from outside the project, the instance couldnt be pinged by hostname. > > I've removed both magnum and Designate and then redeployed both by first > deploying Magnum and testing successful kubernetes cluster deployment using > your fix Bharat. Then I deployed Designate again. Issue [1] is still > present while issue [2] is resolved and no longer present. Kubernetes > cluster deployment is still successful :) > > Thank you once again and have a great week ahead! > > Kind regards, > > Tony Pearce > > > > On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar wrote: > >> Hi Tony >> >> That is a known issue and is due to the default version of heat container >> agent baked into Train release. Please use label >> heat_container_agent_tag=train-stable-3 and you should be good to go. >> >> Cheers >> >> Bharat >> >> On 10 Jul 2020, at 09:18, Tony Pearce wrote: >> >> Hi team, I hope you are all keeping safe and well at the moment. >> >> I am trying to use magnum to launch a kubernetes cluster. I have tried >> different images but currently using Fedora-Atomic 27. The cluster >> deployment from the cluster template is failing and I am here to ask if you >> could please point me in the right direction? I have become stuck and I am >> uncertain how to further troubleshoot this. The cluster seems to fail a few >> minutes after booting up the master node because after I see the logs >> ([1],[2]), I do not see any progress in terms of new (different) logs or >> load on the master. Then the 60-minute timeout is reached and fails the >> cluster. >> >> I deployed this openstack stack using kayobe (kolla-ansible) and this is >> version Train. This is deployed on CentOS 7 within docker containers. >> Kayobe manages this deployment through the ansible playbooks. >> >> This was previously working some months back although I think I may have >> used coreos image at that time, and that is also not working today. The >> deployment would have been back around February 2020. I then deleted that >> deployment and re-deployed. The only change being the hostname for >> controller node as updated in the inventory file for the kayobe. >> Since then which was a month or so back I've been unable to successfully >> deploy a kubernetes cluster. I've tried other fedora-atomic images as well >> as coreos without success. When using the coreos image and when tagging the >> image with the coreos tag as per the magnum docs, the instance fails to >> boot and goes to the rescue shell. However if I manually launch the coreos >> image then it does successfully boot and get configured via cloud-init. All >> of the deployment attempts stop at the same place when using fedora image >> and I have a different experience if I disable TLS: >> >> TLS enabled: master launched, no nodes. Fails when >> running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml >> >> TLS disabled: master and nodes launched but later fails. I >> didnt investigate this very much. >> >> When looking for help around the web, I found this which looks to be the >> same issue that I have at the moment (although he's deployed slightly >> differently, using centos8 and mentions magnum 10): >> >> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ >> >> >> I have the same log messages on the master node within heat. >> >> When going through the troubleshooting guide I see that etcd is running >> and no errors however I dont see any flannel service at all. But I also >> don't know if this has simply failed before getting to deploy flannel or >> whether flannel is the reason. I did try to deploy using a cluster template >> that is using calico as a test but the same result from the logs. >> >> When looking at the stack via cli to see the failed stacks this is what I >> see there: http://paste.openstack.org/show/795736/ >> >> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu >> and 2GB memory. >> Storage is only via cinder as I am using iscsi storage with a cinder >> driver. I dont have any other storage. >> >> On the master, after the failure the heat log repeats these logs: >> >> ++ curl --silent http://127.0.0.1:8080/healthz >> + '[' ok = ok ']' >> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch >> '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' >> error: no configuration has been provided, try setting KUBERNETES_MASTER >> environment variable >> Trying to label master node with node-role.kubernetes.io/master="" >> + echo 'Trying to label master node with node-role.kubernetes.io/master= >> ""' >> + sleep 5s >> >> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ >> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ >> >> May I ask if anyone has a recent deployment of Magnum and a working >> deployment of kubernetes that could share with me the relevant details like >> the image you have used so that I can try and replicate? >> >> To create the cluster template I have been using: >> openstack coe cluster template create k8s-cluster-template \ >> --image Fedora-Atomic-27 \ >> --keypair testpair \ >> --external-network physnet2vlan20 \ >> --dns-nameserver 192.168.7.233 \ >> --flavor 2GB-2vCPU \ >> --docker-volume-size 15 \ >> --network-driver flannel \ >> --coe kubernetes >> >> >> If I have missed anything, I am happy to provide it. >> >> Many thanks in advance for any help or pointers on this. >> >> Regards, >> >> Tony Pearce >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jul 13 17:37:57 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 13 Jul 2020 13:37:57 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - [manila] assert:supports-accessible-upgrade https://review.opendev.org/740509 - Add legacy repository validation https://review.opendev.org/737559 - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 ## General Changes - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Update goal selection docs to clarify the goal count https://review.opendev.org/739150 - Add "tc:approved-release" tag to manila https://review.opendev.org/738105 # Email Threads - New Office Hours: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015761.html - Summit CFP Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015730.html # Other Reminders - OpenStack's 10th anniversary community meeting should be happening July 16th: more info coming soon! - If you're an operator, make sure you fill out our user survey: https://www.openstack.org/user-survey/survey-2020/ - Milestone 2 coming at the end of the month Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From rfolco at redhat.com Mon Jul 13 17:42:16 2020 From: rfolco at redhat.com (Rafael Folco) Date: Mon, 13 Jul 2020 14:42:16 -0300 Subject: [tripleo] TripleO CI Summary: Unified Sprint 29 Message-ID: Greetings, The TripleO CI team has just completed **Unified Sprint 29** (June 18 thru July 08). The following is a summary of completed work during this sprint cycle [1]: - Continued building internal component and integration pipelines. - Designed new promoter tests to run on Python3 to reuse common code and adapt molecule scenarios to the new test sequence standard. Design doc can be found at https://hackmd.io/kJqHSTWWRMOIfIhvDMGFLg. - Important changes to the next-generation promoter have been submitted, e.g. QCOW2 promotions https://review.rdoproject.org/r/#/c/27626/. - CentOS8 component and integration pipelines are still in progress to be completed in the next sprint cycle. - Tempest skip list and ironic plugin general improvements. - Ruck/Rover recorded notes [2]. The planned work for the next sprint [3] extends the work started in the previous sprint and focuses on switching container build jobs to the new building system. The Ruck and Rover for this sprint are Sandeep Yadav (ysandeep), Sorin Sbarnea (zbr). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes to be tracked in etherpad [4]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-29 [2] https://hackmd.io/XcuH2OIVTMiuxyrqSF6ocw [3] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-30 [4] https://hackmd.io/6Bx0FXwlRNCc75l39NSKvg -- Folco -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Mon Jul 13 18:01:16 2020 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 13 Jul 2020 14:01:16 -0400 Subject: [neutron] Bug deputy report for week of July 6th Message-ID: Hi, I was Neutron bug deputy last week. Below is a short summary about reported bugs. -Brian Critical bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1886807 - neutron-ovn-tempest-full-multinode-ovs-master job is failing 100% times - Gate failure High bugs --------- * https://bugs.launchpad.net/neutron/+bug/1886956 - Functional test test_restart_wsgi_on_sighup_multiple_workers is failing sometimes - https://review.opendev.org/#/c/740283/ * https://bugs.launchpad.net/neutron/+bug/1886969 - dhcp bulk reload fails with python3 - needs owner * https://bugs.launchpad.net/neutron/+bug/1887148 - Network loop between physical networks with DVR - https://review.opendev.org/#/c/740724/ proposed Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/1886909 - selection_fields for udp and sctp case doesn't work correctly - This is actually a bug in core OVN, and fixed in v20.06.1, should bump to test with a later version - Also related to supporting SCTP w/Octavia and adding that support to the ovn-octavia-provider driver * https://bugs.launchpad.net/neutron/+bug/1886962 - [OVN][QOS] NBDB qos table entries still exist even after corresponding neutron ports are deleted - needs owner * https://bugs.launchpad.net/neutron/+bug/1887108 - wrong l2pop flows on vlan network - asked for more information on config - needs owner * https://bugs.launchpad.net/neutron/+bug/1887163 - Failed to create network or port with dns_domain parameter - possible config issue with two dns extensions loaded at same time Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/1887147 - neutron-linuxbridge-agent looping same as dhcp - actually looks like a configuration issue in the deployment as privsep helper isn't able to start properly Wishlist bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1886798 - [RFE] Port NUMA affinity policy - Port object update for Nova scheduling - Needs discussion in Drivers meeting Further triage required ----------------------- * https://bugs.launchpad.net/neutron/+bug/1886426 - Neutron sending response to rabbitmq exchange with event type "floatingip.update.end" without updating the status of Floating IP - Asked for more information * https://bugs.launchpad.net/neutron/+bug/1886949 - [RFE] Granular metering data in neutron-metering-agent - Update to metering agent - Asked Slawek to take a look as there was talk about deprecating the metering agent From radoslaw.piliszek at gmail.com Mon Jul 13 19:53:03 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 13 Jul 2020 21:53:03 +0200 Subject: [masakari] Meetings Message-ID: Hello Fellow cloud-HA-seekers, I wanted to attend Masakari meetings but I found the current schedule unfit. Is there a chance to change the schedule? The day is fine but a shift by +3 hours would be nice. Anyhow, I wanted to discuss [1]. I've already proposed a change implementing it and looking forward to positive reviews. :-) That said, please reply on the change directly, or mail me or catch me on IRC, whichever option sounds best to you. [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key -yoctozepto From amy at demarco.com Mon Jul 13 23:19:28 2020 From: amy at demarco.com (Amy Marrich) Date: Mon, 13 Jul 2020 18:19:28 -0500 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Hey Tom, Adding the OpenStack discuss list as I think you got several replies from there as well. Thanks, Amy (spotz) On Mon, Jul 13, 2020 at 5:37 PM Thomas King wrote: > Good day, > > I'm bringing up a thread from June about DHCP relay with neutron networks > in Ironic, specifically using unicast relay. The Triple-O docs do not have > the plain config/neutron config to show how a regular Ironic setup would > use DHCP relay. > > The Neutron segments docs state that I must have a unique physical network > name. If my Ironic controller has a single provisioning network with a > single physical network name, doesn't this prevent my use of multiple > segments? > > Further, the segments docs state this: "The operator must ensure that > every compute host that is supposed to participate in a router provider > network has direct connectivity to one of its segments." (section 3 at > https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - > current docs state the same thing) > This defeats the purpose of using DHCP relay, though, where the Ironic > controller does *not* have direct connectivity to the remote segment. > > Here is a rough drawing - what is wrong with my thinking here? > Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay > <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN > 2115 > > Thank you, > Tom King > _______________________________________________ > openstack-mentoring mailing list > openstack-mentoring at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Tue Jul 14 00:04:16 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 13 Jul 2020 17:04:16 -0700 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi Fabian, Sorry you have run into trouble and we have missed you in the IRC channel. Yeah, that transcript from three years ago isn't going to be much help. A few things we will want to know are: 1. What version of Octavia are you using? 2. Do you have the DNS extension to neutron enabled? 3. When it said "unable to attach port to amphora", can you provide the full error? Was it due to a hostname mismatch error from nova? My guess is you ran into the issue where a port will not attach if the DNS name doesn't match. Our workaround for that accidentally got removed and re-added in https://review.opendev.org/#/c/663277/. Replacing a vrrp_port is tricky, so I'm not surprised you ran into some trouble. Can you please provide the controller worker log output when doing a load balancer failover (let's not use amphora failover here) on paste.openstack.org? You can mark it private and directly reply to me if you have concerns about the log content. All this said, I have recently completely refactored the failover flows recently. This has already merged on the master branch and backports are in process. Michael On Fri, Jul 10, 2020 at 7:07 AM Fabian Zimmermann wrote: > > Hi, > > we had some network issues and now have amphoras which are marked in ERROR state. > > What we already tried: > > - failover the amphora > - failover the loadbalancer > > both did not work, got "unable to attach port to (new) amphora". > > Then we removed the vrrp_port, set the vrrp_port_id to NULL and repeated the amphora failover > > Reverting Err: "PortID: Null" > > Then we created a new vrrp_port as described [1] and added the port-id to the vrrp_port_id and the a suitable vrrp_ip field to our ERRORed amphora entry. > > Restarted failover -> without luck. > > Currently we have an single STANDALONE amphora configured. > > Is there a way to trigger octavia to create new "clean" amphoras for MASTER/BACKUP? > > Thanks, > > Fabian > > [1]http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45 From thomas.king at gmail.com Mon Jul 13 23:21:59 2020 From: thomas.king at gmail.com (Thomas King) Date: Mon, 13 Jul 2020 17:21:59 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Thank you, Amy! Tom On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: > Hey Tom, > > Adding the OpenStack discuss list as I think you got several replies from > there as well. > > Thanks, > > Amy (spotz) > > On Mon, Jul 13, 2020 at 5:37 PM Thomas King wrote: > >> Good day, >> >> I'm bringing up a thread from June about DHCP relay with neutron networks >> in Ironic, specifically using unicast relay. The Triple-O docs do not have >> the plain config/neutron config to show how a regular Ironic setup would >> use DHCP relay. >> >> The Neutron segments docs state that I must have a unique physical >> network name. If my Ironic controller has a single provisioning network >> with a single physical network name, doesn't this prevent my use of >> multiple segments? >> >> Further, the segments docs state this: "The operator must ensure that >> every compute host that is supposed to participate in a router provider >> network has direct connectivity to one of its segments." (section 3 at >> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >> current docs state the same thing) >> This defeats the purpose of using DHCP relay, though, where the Ironic >> controller does *not* have direct connectivity to the remote segment. >> >> Here is a rough drawing - what is wrong with my thinking here? >> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >> <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN >> 2115 >> >> Thank you, >> Tom King >> _______________________________________________ >> openstack-mentoring mailing list >> openstack-mentoring at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Mon Jul 13 23:29:57 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Tue, 14 Jul 2020 07:29:57 +0800 Subject: device compatibility interface for live migration with assigned devices Message-ID: <20200713232957.GD5955@joy-OptiPlex-7040> hi folks, we are defining a device migration compatibility interface that helps upper layer stack like openstack/ovirt/libvirt to check if two devices are live migration compatible. The "devices" here could be MDEVs, physical devices, or hybrid of the two. e.g. we could use it to check whether - a src MDEV can migrate to a target MDEV, - a src VF in SRIOV can migrate to a target VF in SRIOV, - a src MDEV can migration to a target VF in SRIOV. (e.g. SIOV/SRIOV backward compatibility case) The upper layer stack could use this interface as the last step to check if one device is able to migrate to another device before triggering a real live migration procedure. we are not sure if this interface is of value or help to you. please don't hesitate to drop your valuable comments. (1) interface definition The interface is defined in below way: __ userspace /\ \ / \write / read \ ________/__________ ___\|/_____________ | migration_version | | migration_version |-->check migration --------------------- --------------------- compatibility device A device B a device attribute named migration_version is defined under each device's sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). userspace tools read the migration_version as a string from the source device, and write it to the migration_version sysfs attribute in the target device. The userspace should treat ANY of below conditions as two devices not compatible: - any one of the two devices does not have a migration_version attribute - error when reading from migration_version attribute of one device - error when writing migration_version string of one device to migration_version attribute of the other device The string read from migration_version attribute is defined by device vendor driver and is completely opaque to the userspace. for a Intel vGPU, string format can be defined like "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". for an NVMe VF connecting to a remote storage. it could be "PCI ID" + "driver version" + "configured remote storage URL" for a QAT VF, it may be "PCI ID" + "driver version" + "supported encryption set". (to avoid namespace confliction from each vendor, we may prefix a driver name to each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) (2) backgrounds The reason we hope the migration_version string is opaque to the userspace is that it is hard to generalize standard comparing fields and comparing methods for different devices from different vendors. Though userspace now could still do a simple string compare to check if two devices are compatible, and result should also be right, it's still too limited as it excludes the possible candidate whose migration_version string fails to be equal. e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible with another MDEV with mdev_type_3, aggregator count 1, even their migration_version strings are not equal. (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). besides that, driver version + configured resources are all elements demanding to take into account. So, we hope leaving the freedom to vendor driver and let it make the final decision in a simple reading from source side and writing for test in the target side way. we then think the device compatibility issues for live migration with assigned devices can be divided into two steps: a. management tools filter out possible migration target devices. Tags could be created according to info from product specification. we think openstack/ovirt may have vendor proprietary components to create those customized tags for each product from each vendor. e.g. for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to search target vGPU are like: a tag for compatible parent PCI IDs, a tag for a range of gvt driver versions, a tag for a range of mdev type + aggregator count for NVMe VF, the tags to search target VF may be like: a tag for compatible PCI IDs, a tag for a range of driver versions, a tag for URL of configured remote storage. b. with the output from step a, openstack/ovirt/libvirt could use our proposed device migration compatibility interface to make sure the two devices are indeed live migration compatible before launching the real live migration process to start stream copying, src device stopping and target device resuming. It is supposed that this step would not bring any performance penalty as -in kernel it's just a simple string decoding and comparing -in openstack/ovirt, it could be done by extending current function check_can_live_migrate_destination, along side claiming target resources.[1] [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html Thanks Yan From mthode at mthode.org Tue Jul 14 04:10:46 2020 From: mthode at mthode.org (Matthew Thode) Date: Mon, 13 Jul 2020 23:10:46 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> Message-ID: <20200714041046.hzxrorrisrhdnhrv@mthode.org> On 20-07-04 20:24:55, Ghanshyam Mann wrote: > ---- On Fri, 03 Jul 2020 17:29:18 -0500 Ghanshyam Mann wrote ---- > > ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > > > Hello, > > > > > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > > > > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > > > > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. > > > > Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. > > As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track > > it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 > > > > This is patch to use old setuptools- > > - https://review.opendev.org/#/c/739290/ > > Updates: > Issue is when setuptools adopts distutils from the standard library (in 48.0.0) and uses it, downstream packagers customization to distutils will be lost. > - https://github.com/pypa/setuptools/issues/2232 > > setuptools 49.1.0 reverted the adoption of distutils from the standard library and its working now. > > I have closed the devstack bug 1886237 and proposed the revert of capping of setuptools by blacklisting 48.0.0 and 49.0.0 so > that we test with latest setuptools. For now, devstack will pick the 49.1.0 and pass. > - https://review.opendev.org/#/c/739294/2 > > In summary, gate is green and you can recheck on the failed patches. > It looks like they (upstream) are rolling forward with the change. There are workarounds for those that need it (env var). Please see the above linked issue for more information. > -gmann > > > > > > > > Clark > > > > > > > > > > > -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From ruslanas at lpic.lt Tue Jul 14 09:01:03 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 11:01:03 +0200 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: hi, have you checked: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html ? I am following this link. I only have one network, having different issues tho ;) On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: > Thank you, Amy! > > Tom > > On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: > >> Hey Tom, >> >> Adding the OpenStack discuss list as I think you got several replies from >> there as well. >> >> Thanks, >> >> Amy (spotz) >> >> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >> wrote: >> >>> Good day, >>> >>> I'm bringing up a thread from June about DHCP relay with neutron >>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>> not have the plain config/neutron config to show how a regular Ironic setup >>> would use DHCP relay. >>> >>> The Neutron segments docs state that I must have a unique physical >>> network name. If my Ironic controller has a single provisioning network >>> with a single physical network name, doesn't this prevent my use of >>> multiple segments? >>> >>> Further, the segments docs state this: "The operator must ensure that >>> every compute host that is supposed to participate in a router provider >>> network has direct connectivity to one of its segments." (section 3 at >>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>> current docs state the same thing) >>> This defeats the purpose of using DHCP relay, though, where the Ironic >>> controller does *not* have direct connectivity to the remote segment. >>> >>> Here is a rough drawing - what is wrong with my thinking here? >>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >>> <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN >>> 2115 >>> >>> Thank you, >>> Tom King >>> _______________________________________________ >>> openstack-mentoring mailing list >>> openstack-mentoring at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>> >> -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Tue Jul 14 12:26:11 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 14:26:11 +0200 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others Message-ID: Hi all, Borry to keep spamming you all the time. But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: - tuned ssh - automatically generated root pass to the one I need - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. - and so on. I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) Thank you in advance. -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From reza.b2008 at gmail.com Tue Jul 14 13:06:22 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Tue, 14 Jul 2020 17:36:22 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: Thanks for your information. Actually, I was in doubt of using Ussuri (latest version) for my environment. Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but overcloud image build got some error: $ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml ... 2020-07-14 12:14:22.714 | Running install-packages install. 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync 2020-07-14 12:14:23.252 | DNF version: 4.2.17 2020-07-14 12:14:23.253 | cachedir: /tmp/yum 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux 8; generic; Linux.x86_64)' 2020-07-14 12:14:23.472 | repo: using cache for: AppStream 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 23:25:16 2020. 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 23:25:12 2020. 2020-07-14 12:14:23.517 | repo: using cache for: extras 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 00:15:26 2020. 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on Tue Jul 14 11:43:38 2020. 2020-07-14 12:14:23.767 | Completion plugin: Generating completion cache... 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is already installed. 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already installed. 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote 2020-07-14 12:14:23.929 | No match for argument: osops-tools-monitoring-oschecks 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker 2020-07-14 12:14:23.936 | No match for argument: crudini 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux 2020-07-14 12:14:23.953 | No match for argument: pacemaker 2020-07-14 12:14:23.957 | No match for argument: pcs 2020-07-14 12:14:23.961 | Error: Unable to find a match: python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs Do you have any idea? On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: > Hi folks, > > On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: > >> I don't believe centos8 containers are available for Train yet. The >> error you're hitting is because it's fetching centos7 containers and >> the ironic container is not backwards compatible between the two >> versions. If you want centos8, use Ussuri. >> >> > fyi we started pushing centos8 train last week - slightly different > namespace - latest current-tripleo containers are pushed to > https://hub.docker.com/u/tripleotraincentos8 > > hope it helps > > >> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi >> wrote: >> > >> > I found following error in ironic and container-puppet-ironic container >> log during installation: >> > >> > puppet-user: Error: >> /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: >> Could not evaluate: Could not retrieve information from environment >> production source(s) file:/tftpboot/ldlinux.c32 >> > >> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi >> wrote: >> >> >> >> Hi, >> >> >> >> I'm going to install OpenStack Train with the help of TripleO on >> CentOS 8, but undercloud installation fails with the following error: >> >> >> >> "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen >> 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping >> because of failed dependencies", "puppet-user: Notice: Applied catalog in >> 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: >> 97", "puppet-user: Events:", "puppet-user: Failure: 1", >> "puppet-user: Success: 97", "puppet-user: Total: 98", >> "puppet-user: Resources:", "puppet-user: Failed: 1", >> "puppet-user: Skipped: 41", "puppet-user: Changed: 97", >> "puppet-user: Out of sync: 98", "puppet-user: Total: >> 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", >> "puppet-user: Concat file: 0.00", "puppet-user: Anchor: >> 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: >> Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: >> Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", >> "puppet-user: Catalog application: 1.72", "puppet-user: Last >> run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: >> Total: 1.72", "puppet-user: Version:", "puppet-user: >> Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ >> '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit >> 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying >> running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed >> running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- >> Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 >> ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: >> 95117 -- ERROR configuring zaqar"]} >> >> >> >> Any suggestion would be grateful. >> >> Regards, >> >> Reza >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 13:11:33 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:11:33 -0600 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi wrote: > > Thanks for your information. > Actually, I was in doubt of using Ussuri (latest version) for my environment. > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but overcloud image build got some error: > > $ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > > ... > 2020-07-14 12:14:22.714 | Running install-packages install. > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux 8; generic; Linux.x86_64)' > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 23:25:16 2020. > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 23:25:12 2020. > 2020-07-14 12:14:23.517 | repo: using cache for: extras > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 00:15:26 2020. > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on Tue Jul 14 11:43:38 2020. > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion cache... > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is already installed. > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already installed. > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote > 2020-07-14 12:14:23.929 | No match for argument: osops-tools-monitoring-oschecks > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker > 2020-07-14 12:14:23.936 | No match for argument: crudini > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux > 2020-07-14 12:14:23.953 | No match for argument: pacemaker > 2020-07-14 12:14:23.957 | No match for argument: pcs > 2020-07-14 12:14:23.961 | Error: Unable to find a match: python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs > > Do you have any idea? > Seems like you are missing the correct DIP_YUM_REPO_CONF setting per #3 from https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images > > > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: >> >> Hi folks, >> >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: >>> >>> I don't believe centos8 containers are available for Train yet. The >>> error you're hitting is because it's fetching centos7 containers and >>> the ironic container is not backwards compatible between the two >>> versions. If you want centos8, use Ussuri. >>> >> >> fyi we started pushing centos8 train last week - slightly different namespace - latest current-tripleo containers are pushed to https://hub.docker.com/u/tripleotraincentos8 >> >> hope it helps >> >>> >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi wrote: >>> > >>> > I found following error in ironic and container-puppet-ironic container log during installation: >>> > >>> > puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 >>> > >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: >>> >> >>> >> "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} >>> >> >>> >> Any suggestion would be grateful. >>> >> Regards, >>> >> Reza >>> >> >>> >> >>> >>> From aschultz at redhat.com Tue Jul 14 13:22:34 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:22:34 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: > > Hi all, > > Borry to keep spamming you all the time. > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: These don't necessarily need to be done in the image itself but you can virt customize the image prior to uploading it to the undercloud to inject some things. We provide ways of configuring these things at deployment time. > - tuned ssh We have sshd configured via a service. Available options are listed in the service file: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > - automatically generated root pass to the one I need This can be done via a firstboot script. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) You'd probably want to do this via a first boot as well. If you are deploying with overcloud images, technically you shouldn't need a proxy on install but you'd likely need one for subsequent updates. > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. See KernelArgs. https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > - and so on. > Your best reference for what is available is likely going to be by looking in the THT/deployment folder for the service configurations. We don't expose everything but we do allow configurability for a significant amount of options. *ExtraConfig can be used to tweak additional options that we don't necessarily expose directly if you know what options need to be set via the appropriate puppet modules. If there are services we don't actually configure, you can define your own custom tripleo service templates and add them to the roles to do whatever you want. > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) > > Thank you in advance. > > -- > Ruslanas Gžibovskis > +370 6030 7030 From aschultz at redhat.com Tue Jul 14 13:29:11 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:29:11 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:22 AM Alex Schultz wrote: > > On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: > > > > Hi all, > > > > Borry to keep spamming you all the time. > > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: > > These don't necessarily need to be done in the image itself but you > can virt customize the image prior to uploading it to the undercloud > to inject some things. We provide ways of configuring these things at > deployment time. > > > - tuned ssh > > We have sshd configured via a service. Available options are listed in > the service file: > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > > > - automatically generated root pass to the one I need > > This can be done via a firstboot script. > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html Forgot to include this but we ship an example specifically for this: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/firstboot/userdata_root_password.yaml > > > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) > > You'd probably want to do this via a first boot as well. If you are > deploying with overcloud images, technically you shouldn't need a > proxy on install but you'd likely need one for subsequent updates. > > > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. > > See KernelArgs. > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > > > - and so on. > > > > Your best reference for what is available is likely going to be by > looking in the THT/deployment folder for the service configurations. > We don't expose everything but we do allow configurability for a > significant amount of options. *ExtraConfig can be used to tweak > additional options that we don't necessarily expose directly if you > know what options need to be set via the appropriate puppet modules. > If there are services we don't actually configure, you can define your > own custom tripleo service templates and add them to the roles to do > whatever you want. > > > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) > > > > Thank you in advance. > > > > -- > > Ruslanas Gžibovskis > > +370 6030 7030 From emilien at redhat.com Tue Jul 14 13:30:00 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 14 Jul 2020 09:30:00 -0400 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core Message-ID: Hi folks, Rabi has proved deep technical understanding on the TripleO components over the last years. Initially as a major maintainer of the Heat project and then a regular contributor to TripleO, he got involved at different levels: - Optimization of the Heat templates, to reduce the number of resources or improve them to make it faster and more efficient at scale. - Migration of the Mistral workflows into native Ansible modules and Python code into tripleo-common, with end-to-end expertise. - Regular contributions to the container tooling integration. Being involved on the mailing-list and IRC channels, Rabi is always helpful to the community and here to help. He has provided thorough reviews in principal components on TripleO as well as a lot of bug fixes or new features; which contributed to make TripleO more stable and scalable. I would like to propose him be part of the TripleO core team. Thanks Rabi for your hard work! -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Tue Jul 14 13:37:35 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 16:37:35 +0300 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: Thank you Alex. I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: > On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis > wrote: > > > > Hi all, > > > > Borry to keep spamming you all the time. > > But could you help me to find a correct place to "modify" image content > (packages installed and not installed) and files and services configured in > an "adjusted" way so I would have for example: > > These don't necessarily need to be done in the image itself but you > can virt customize the image prior to uploading it to the undercloud > to inject some things. We provide ways of configuring these things at > deployment time. > > > - tuned ssh > > We have sshd configured via a service. Available options are listed in > the service file: > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > > > - automatically generated root pass to the one I need > > This can be done via a firstboot script. > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > > > - Also added proxy config to /etc/yum.conf to certain computes, and > other would be used without proxy (maybe extraconfig option?) > > You'd probably want to do this via a first boot as well. If you are > deploying with overcloud images, technically you shouldn't need a > proxy on install but you'd likely need one for subsequent updates. > > > - set up kernel parameters, so I would have console output duplicated > to serial connection and to iDRAC serial, so I could see login screen over > idrac ssh. > > See KernelArgs. > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > > > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > > > - and so on. > > > > Your best reference for what is available is likely going to be by > looking in the THT/deployment folder for the service configurations. > We don't expose everything but we do allow configurability for a > significant amount of options. *ExtraConfig can be used to tweak > additional options that we don't necessarily expose directly if you > know what options need to be set via the appropriate puppet modules. > If there are services we don't actually configure, you can define your > own custom tripleo service templates and add them to the roles to do > whatever you want. > > > I believe many of those things can be done over extraconfig, I just do > not know options to modify. maybe you can point me like a blind hen into a > correct bowl? :))) > > > > Thank you in advance. > > > > -- > > Ruslanas Gžibovskis > > +370 6030 7030 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 13:44:44 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:44:44 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis wrote: > > Thank you Alex. > > I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. > For Ussuri it's still available. In future versions we'll be switching out how we provision nodes which means the firstboot interface likely will go away and be replaced with something else during provisioning. However it's still currently valid. > For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) virt-customize is probably the easiest thing to just inject something unmanaged into the environment. You can technically use an AllNodesExtraConfig (example THT/environment/enable-swap.yaml & THT/extraconfig/all_nodes/swap.yaml) to do some custom script at installation time as well to manage the files. However this uses a Heat SoftwareConfig which is also deprecated. Though i'm not certain we have an official replacement for that yet. > > yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. > > regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. > > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: >> >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: >> > >> > Hi all, >> > >> > Borry to keep spamming you all the time. >> > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: >> >> These don't necessarily need to be done in the image itself but you >> can virt customize the image prior to uploading it to the undercloud >> to inject some things. We provide ways of configuring these things at >> deployment time. >> >> > - tuned ssh >> >> We have sshd configured via a service. Available options are listed in >> the service file: >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml >> >> > - automatically generated root pass to the one I need >> >> This can be done via a firstboot script. >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html >> >> > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) >> >> You'd probably want to do this via a first boot as well. If you are >> deploying with overcloud images, technically you shouldn't need a >> proxy on install but you'd likely need one for subsequent updates. >> >> > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. >> >> See KernelArgs. >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 >> >> https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e >> >> > - and so on. >> > >> >> Your best reference for what is available is likely going to be by >> looking in the THT/deployment folder for the service configurations. >> We don't expose everything but we do allow configurability for a >> significant amount of options. *ExtraConfig can be used to tweak >> additional options that we don't necessarily expose directly if you >> know what options need to be set via the appropriate puppet modules. >> If there are services we don't actually configure, you can define your >> own custom tripleo service templates and add them to the roles to do >> whatever you want. >> >> > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) >> > >> > Thank you in advance. >> > >> > -- >> > Ruslanas Gžibovskis >> > +370 6030 7030 >> From aschultz at redhat.com Tue Jul 14 13:45:15 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:45:15 -0600 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 On Tue, Jul 14, 2020 at 7:39 AM Emilien Macchi wrote: > > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components over the last years. > Initially as a major maintainer of the Heat project and then a regular contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as well as a lot of bug fixes or new features; which contributed to make TripleO more stable and scalable. I would like to propose him be part of the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi From ruslanas at lpic.lt Tue Jul 14 13:50:12 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 15:50:12 +0200 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I am not sure, but that might help. I use these steps for deployment: cp -ar /etc/yum.repos.d repos sed -i s/gpgcheck=1/gpgcheck=0/g repos/*repo export DIB_YUM_REPO_CONF="$(ls /home/stack/repos/*repo)" export STABLE_RELEASE="ussuri" export OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml" source /home/stack/stackrc mkdir /home/stack/images cd /home/stack/images openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml && openstack overcloud image upload --update-existing cd /home/stack ls /home/stack/images this works for all packages except: pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs to solve these you need to enable in repos dir HA repo (change in enable=0 to enable=1 and then this will solve you issues with most except: osops-tools-monitoring-oschecks this one, you can change by: modify line in file: /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map to have this line: "oschecks_package": "sysstat" instead of "oschecks_package": "osops-tools-monitoring-oschecks " On Tue, 14 Jul 2020 at 15:14, Alex Schultz wrote: > On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi > wrote: > > > > Thanks for your information. > > Actually, I was in doubt of using Ussuri (latest version) for my > environment. > > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but > overcloud image build got some error: > > > > $ openstack overcloud image build --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml > --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > > > > ... > > 2020-07-14 12:14:22.714 | Running install-packages install. > > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient > python3-barbicanclient python3-cinderclient python3-designateclient > python3-glanceclient python3-gnocchiclient python3-heatclient > python3-ironicclient python3-keystoneclient python3-manilaclient > python3-mistralclient python3-neutronclient python3-novaclient > python3-openstackclient python3-pankoclient python3-saharaclient > python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony > pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning > osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman > libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch > openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt > ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs > > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, > config-manager, copr, debug, debuginfo-install, download, > generate_completion_cache, needs-restarting, playground, repoclosure, > repodiff, repograph, repomanage, reposync > > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 > > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum > > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux > 8; generic; Linux.x86_64)' > > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream > > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 > 23:25:16 2020. > > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS > > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 > 23:25:12 2020. > > 2020-07-14 12:14:23.517 | repo: using cache for: extras > > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 > 00:15:26 2020. > > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on > Tue Jul 14 11:43:38 2020. > > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion > cache... > > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient > > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient > > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient > > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient > > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient > > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient > > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient > > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient > > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient > > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient > > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient > > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient > > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient > > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient > > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient > > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient > > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient > > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient > > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is > already installed. > > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already > installed. > > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote > > 2020-07-14 12:14:23.929 | No match for argument: > osops-tools-monitoring-oschecks > > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker > > 2020-07-14 12:14:23.936 | No match for argument: crudini > > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux > > 2020-07-14 12:14:23.953 | No match for argument: pacemaker > > 2020-07-14 12:14:23.957 | No match for argument: pcs > > 2020-07-14 12:14:23.961 | Error: Unable to find a match: > python3-aodhclient python3-barbicanclient python3-cinderclient > python3-designateclient python3-glanceclient python3-gnocchiclient > python3-heatclient python3-ironicclient python3-keystoneclient > python3-manilaclient python3-mistralclient python3-neutronclient > python3-novaclient python3-openstackclient python3-pankoclient > python3-saharaclient python3-swiftclient python3-zaqarclient > pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini > openstack-selinux pacemaker pcs > > > > Do you have any idea? > > > > Seems like you are missing the correct DIP_YUM_REPO_CONF setting per > #3 from > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images > > > > > > > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: > >> > >> Hi folks, > >> > >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz > wrote: > >>> > >>> I don't believe centos8 containers are available for Train yet. The > >>> error you're hitting is because it's fetching centos7 containers and > >>> the ironic container is not backwards compatible between the two > >>> versions. If you want centos8, use Ussuri. > >>> > >> > >> fyi we started pushing centos8 train last week - slightly different > namespace - latest current-tripleo containers are pushed to > https://hub.docker.com/u/tripleotraincentos8 > >> > >> hope it helps > >> > >>> > >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi < > reza.b2008 at gmail.com> wrote: > >>> > > >>> > I found following error in ironic and container-puppet-ironic > container log during installation: > >>> > > >>> > puppet-user: Error: > /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: > Could not evaluate: Could not retrieve information from environment > production source(s) file:/tftpboot/ldlinux.c32 > >>> > > >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> I'm going to install OpenStack Train with the help of TripleO on > CentOS 8, but undercloud installation fails with the following error: > >>> >> > >>> >> "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ > '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit > 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying > running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed > running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- > Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 > ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: > 95117 -- ERROR configuring zaqar"]} > >>> >> > >>> >> Any suggestion would be grateful. > >>> >> Regards, > >>> >> Reza > >>> >> > >>> >> > >>> > >>> > > > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Jul 14 13:55:53 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 14 Jul 2020 09:55:53 -0400 Subject: [cinder] cinderlib reviews needed Message-ID: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> cinderlib is on a cycle-trailing release model and the Ussuri release is coming up soon. Because it's still a new project, I thought I'd send a reminder in case it fell off your radar. These patches need to merge before we cut the release: https://review.opendev.org/720553 https://review.opendev.org/738226 https://review.opendev.org/738473 https://review.opendev.org/739190 https://review.opendev.org/738230 https://review.opendev.org/738866 https://review.opendev.org/738472 https://review.opendev.org/738213 They each have a single +2 at the moment, and they are all short, focused patches. cheers, brian From ruslanas at lpic.lt Tue Jul 14 14:33:40 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 16:33:40 +0200 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: and by the way, this is what I get in overcloud hosts: *cat centos7-rt.repo* [centos7-rt] name=CentOS 7 - Realtime baseurl=http://mirror.centos.org/centos/7/rt/x86_64/ enabled=1 gpgcheck=0 even it has a centos8 running ;) looks like, some hardcoded yaml file is still inplace :) On Tue, 14 Jul 2020 at 15:45, Alex Schultz wrote: > On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis > wrote: > > > > Thank you Alex. > > > > I have read around in this mailinglist, that firstboot will be removed. > So I was curious, what way forward to have in case it is depricated. > > > > For Ussuri it's still available. In future versions we'll be switching > out how we provision nodes which means the firstboot interface likely > will go away and be replaced with something else during provisioning. > However it's still currently valid. > > > For modifying overcloud-full.qcow2 with virt-customise it do not look > nice, would be prety to do it on image generation, not sure where tho... > maybe writing own module might do the trick, but I find it as dirty > workaround :)) > > virt-customize is probably the easiest thing to just inject something > unmanaged into the environment. You can technically use an > AllNodesExtraConfig (example THT/environment/enable-swap.yaml & > THT/extraconfig/all_nodes/swap.yaml) to do some custom script at > installation time as well to manage the files. However this uses a > Heat SoftwareConfig which is also deprecated. Though i'm not certain > we have an official replacement for that yet. > > > > > yes, for osp modules, i know how to use puppet to provide needed values. > I thought this might be for everything. > > > > regarding proxy in certain compute, it needs to do dnf update for > centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am > confused why, but it does so. > > > > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: > >> > >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis > wrote: > >> > > >> > Hi all, > >> > > >> > Borry to keep spamming you all the time. > >> > But could you help me to find a correct place to "modify" image > content (packages installed and not installed) and files and services > configured in an "adjusted" way so I would have for example: > >> > >> These don't necessarily need to be done in the image itself but you > >> can virt customize the image prior to uploading it to the undercloud > >> to inject some things. We provide ways of configuring these things at > >> deployment time. > >> > >> > - tuned ssh > >> > >> We have sshd configured via a service. Available options are listed in > >> the service file: > >> > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > >> > >> > - automatically generated root pass to the one I need > >> > >> This can be done via a firstboot script. > >> > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > >> > >> > - Also added proxy config to /etc/yum.conf to certain computes, and > other would be used without proxy (maybe extraconfig option?) > >> > >> You'd probably want to do this via a first boot as well. If you are > >> deploying with overcloud images, technically you shouldn't need a > >> proxy on install but you'd likely need one for subsequent updates. > >> > >> > - set up kernel parameters, so I would have console output > duplicated to serial connection and to iDRAC serial, so I could see login > screen over idrac ssh. > >> > >> See KernelArgs. > >> > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > >> > >> > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > >> > >> > - and so on. > >> > > >> > >> Your best reference for what is available is likely going to be by > >> looking in the THT/deployment folder for the service configurations. > >> We don't expose everything but we do allow configurability for a > >> significant amount of options. *ExtraConfig can be used to tweak > >> additional options that we don't necessarily expose directly if you > >> know what options need to be set via the appropriate puppet modules. > >> If there are services we don't actually configure, you can define your > >> own custom tripleo service templates and add them to the roles to do > >> whatever you want. > >> > >> > I believe many of those things can be done over extraconfig, I just > do not know options to modify. maybe you can point me like a blind hen into > a correct bowl? :))) > >> > > >> > Thank you in advance. > >> > > >> > -- > >> > Ruslanas Gžibovskis > >> > +370 6030 7030 > >> > > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 14:37:25 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 08:37:25 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: https://review.opendev.org/#/c/738154/ On Tue, Jul 14, 2020 at 8:34 AM Ruslanas Gžibovskis wrote: > > and by the way, this is what I get in overcloud hosts: > cat centos7-rt.repo > [centos7-rt] > name=CentOS 7 - Realtime > baseurl=http://mirror.centos.org/centos/7/rt/x86_64/ > enabled=1 > gpgcheck=0 > > even it has a centos8 running ;) > looks like, some hardcoded yaml file is still inplace :) > > On Tue, 14 Jul 2020 at 15:45, Alex Schultz wrote: >> >> On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis wrote: >> > >> > Thank you Alex. >> > >> > I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. >> > >> >> For Ussuri it's still available. In future versions we'll be switching >> out how we provision nodes which means the firstboot interface likely >> will go away and be replaced with something else during provisioning. >> However it's still currently valid. >> >> > For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) >> >> virt-customize is probably the easiest thing to just inject something >> unmanaged into the environment. You can technically use an >> AllNodesExtraConfig (example THT/environment/enable-swap.yaml & >> THT/extraconfig/all_nodes/swap.yaml) to do some custom script at >> installation time as well to manage the files. However this uses a >> Heat SoftwareConfig which is also deprecated. Though i'm not certain >> we have an official replacement for that yet. >> >> > >> > yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. >> > >> > regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. >> > >> > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: >> >> >> >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: >> >> > >> >> > Hi all, >> >> > >> >> > Borry to keep spamming you all the time. >> >> > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: >> >> >> >> These don't necessarily need to be done in the image itself but you >> >> can virt customize the image prior to uploading it to the undercloud >> >> to inject some things. We provide ways of configuring these things at >> >> deployment time. >> >> >> >> > - tuned ssh >> >> >> >> We have sshd configured via a service. Available options are listed in >> >> the service file: >> >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml >> >> >> >> > - automatically generated root pass to the one I need >> >> >> >> This can be done via a firstboot script. >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html >> >> >> >> > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) >> >> >> >> You'd probably want to do this via a first boot as well. If you are >> >> deploying with overcloud images, technically you shouldn't need a >> >> proxy on install but you'd likely need one for subsequent updates. >> >> >> >> > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. >> >> >> >> See KernelArgs. >> >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 >> >> >> >> https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e >> >> >> >> > - and so on. >> >> > >> >> >> >> Your best reference for what is available is likely going to be by >> >> looking in the THT/deployment folder for the service configurations. >> >> We don't expose everything but we do allow configurability for a >> >> significant amount of options. *ExtraConfig can be used to tweak >> >> additional options that we don't necessarily expose directly if you >> >> know what options need to be set via the appropriate puppet modules. >> >> If there are services we don't actually configure, you can define your >> >> own custom tripleo service templates and add them to the roles to do >> >> whatever you want. >> >> >> >> > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) >> >> > >> >> > Thank you in advance. >> >> > >> >> > -- >> >> > Ruslanas Gžibovskis >> >> > +370 6030 7030 >> >> >> > > > -- > Ruslanas Gžibovskis > +370 6030 7030 From jungleboyj at gmail.com Tue Jul 14 14:41:59 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Tue, 14 Jul 2020 09:41:59 -0500 Subject: [cinder] cinderlib reviews needed In-Reply-To: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> References: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> Message-ID: Brian, Thanks for highlighting.  I have taken care of most of these. There was just one that I thought should have some more eyes on it. Thanks! Jay On 7/14/2020 8:55 AM, Brian Rosmaita wrote: > cinderlib is on a cycle-trailing release model and the Ussuri release > is coming up soon.  Because it's still a new project, I thought I'd > send a reminder in case it fell off your radar.  These patches need to > merge before we cut the release: > > https://review.opendev.org/720553 > https://review.opendev.org/738226 > https://review.opendev.org/738473 > https://review.opendev.org/739190 > https://review.opendev.org/738230 > https://review.opendev.org/738866 > https://review.opendev.org/738472 > https://review.opendev.org/738213 > > They each have a single +2 at the moment, and they are all short, > focused patches. > > cheers, > brian > > From gagehugo at gmail.com Tue Jul 14 14:54:17 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Tue, 14 Jul 2020 09:54:17 -0500 Subject: [security] Security SIG meeting July 16th 2020 canceled Message-ID: Hello everyone, The security sig meeting this week will be cancelled due to the 10 years of openstack celebration happening at the same time. We will meet next week at the scheduled time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Tue Jul 14 14:58:48 2020 From: marios at redhat.com (Marios Andreou) Date: Tue, 14 Jul 2020 17:58:48 +0300 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1000 I thought he already was core ? used to be? On Tue, Jul 14, 2020 at 4:31 PM Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Tue Jul 14 15:10:12 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 14 Jul 2020 17:10:12 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: On 7/14/20 3:30 PM, Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components over the > last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and Python code > into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always helpful to > the community and here to help. > He has provided thorough reviews in principal components on TripleO as well as a > lot of bug fixes or new features; which contributed to make TripleO more stable > and scalable. I would like to propose him be part of the TripleO core team. > > Thanks Rabi for your hard work! +1 > -- > Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando From katalsupriya36 at gmail.com Tue Jul 14 07:53:20 2020 From: katalsupriya36 at gmail.com (supriya katal) Date: Tue, 14 Jul 2020 13:23:20 +0530 Subject: Cloud Computing Resource Message-ID: Hello Team I have checked your sites. https://github.com/openstacknetsdk/openstack.net/wiki/Getting-Started-With-The-OpenStack-NET-SDK For using this sdk, one needs to create an account for *RECKSPACE *open cloud. I have an account in STACKPATH https://control.stackpath.com/ Can I use a stackpath storage object for uploading and accessing the files? I have tried to use your api for uploading and accessing files of STACKPATH object storage. https://docs.openstack.org/api-ref/object-store/index.html?expanded=create-or-replace-object-detail#objects but I got an error of Access Denied. -------------- next part -------------- An HTML attachment was scrubbed... URL: From berrange at redhat.com Tue Jul 14 10:21:29 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 14 Jul 2020 11:21:29 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200713232957.GD5955@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> Message-ID: <20200714102129.GD25187@redhat.com> On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > hi folks, > we are defining a device migration compatibility interface that helps upper > layer stack like openstack/ovirt/libvirt to check if two devices are > live migration compatible. > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > e.g. we could use it to check whether > - a src MDEV can migrate to a target MDEV, > - a src VF in SRIOV can migrate to a target VF in SRIOV, > - a src MDEV can migration to a target VF in SRIOV. > (e.g. SIOV/SRIOV backward compatibility case) > > The upper layer stack could use this interface as the last step to check > if one device is able to migrate to another device before triggering a real > live migration procedure. > we are not sure if this interface is of value or help to you. please don't > hesitate to drop your valuable comments. > > > (1) interface definition > The interface is defined in below way: > > __ userspace > /\ \ > / \write > / read \ > ________/__________ ___\|/_____________ > | migration_version | | migration_version |-->check migration > --------------------- --------------------- compatibility > device A device B > > > a device attribute named migration_version is defined under each device's > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > userspace tools read the migration_version as a string from the source device, > and write it to the migration_version sysfs attribute in the target device. > > The userspace should treat ANY of below conditions as two devices not compatible: > - any one of the two devices does not have a migration_version attribute > - error when reading from migration_version attribute of one device > - error when writing migration_version string of one device to > migration_version attribute of the other device > > The string read from migration_version attribute is defined by device vendor > driver and is completely opaque to the userspace. > for a Intel vGPU, string format can be defined like > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > for an NVMe VF connecting to a remote storage. it could be > "PCI ID" + "driver version" + "configured remote storage URL" > > for a QAT VF, it may be > "PCI ID" + "driver version" + "supported encryption set". > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > (2) backgrounds > > The reason we hope the migration_version string is opaque to the userspace > is that it is hard to generalize standard comparing fields and comparing > methods for different devices from different vendors. > Though userspace now could still do a simple string compare to check if > two devices are compatible, and result should also be right, it's still > too limited as it excludes the possible candidate whose migration_version > string fails to be equal. > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > with another MDEV with mdev_type_3, aggregator count 1, even their > migration_version strings are not equal. > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > besides that, driver version + configured resources are all elements demanding > to take into account. > > So, we hope leaving the freedom to vendor driver and let it make the final decision > in a simple reading from source side and writing for test in the target side way. > > > we then think the device compatibility issues for live migration with assigned > devices can be divided into two steps: > a. management tools filter out possible migration target devices. > Tags could be created according to info from product specification. > we think openstack/ovirt may have vendor proprietary components to create > those customized tags for each product from each vendor. > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > search target vGPU are like: > a tag for compatible parent PCI IDs, > a tag for a range of gvt driver versions, > a tag for a range of mdev type + aggregator count > > for NVMe VF, the tags to search target VF may be like: > a tag for compatible PCI IDs, > a tag for a range of driver versions, > a tag for URL of configured remote storage. Requiring management application developers to figure out this possible compatibility based on prod specs is really unrealistic. Product specs are typically as clear as mud, and with the suggestion we consider different rules for different types of devices, add up to a huge amount of complexity. This isn't something app developers should have to spend their time figuring out. The suggestion that we make use of vendor proprietary helper components is totally unacceptable. We need to be able to build a solution that works with exclusively an open source software stack. IMHO there needs to be a mechanism for the kernel to report via sysfs what versions are supported on a given device. This puts the job of reporting compatible versions directly under the responsibility of the vendor who writes the kernel driver for it. They are the ones with the best knowledge of the hardware they've built and the rules around its compatibility. > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > device migration compatibility interface to make sure the two devices are > indeed live migration compatible before launching the real live migration > process to start stream copying, src device stopping and target device > resuming. > It is supposed that this step would not bring any performance penalty as > -in kernel it's just a simple string decoding and comparing > -in openstack/ovirt, it could be done by extending current function > check_can_live_migrate_destination, along side claiming target resources.[1] > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > Thanks > Yan > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From smooney at redhat.com Tue Jul 14 12:33:24 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 14 Jul 2020 13:33:24 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714102129.GD25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, mdev live migration is completely possible to do but i agree with Dan barrange's comments from the point of view of openstack integration i dont see calling out to a vender sepecific tool to be an accpetable solutions for device compatiablity checking. the sys filesystem that describs the mdevs that can be created shoudl also contain the relevent infomation such taht nova could integrate it via libvirt xml representation or directly retrive the info from sysfs. > > - a src VF in SRIOV can migrate to a target VF in SRIOV, so vf to vf migration is not possible in the general case as there is no standarised way to transfer teh device state as part of the siorv specs produced by the pci-sig as such there is not vender neutral way to support sriov live migration. > > - a src MDEV can migration to a target VF in SRIOV. that also makes this unviable > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. well actully that is already too late really. ideally we would want to do this compaiablity check much sooneer to avoid the migration failing. in an openstack envionment at least by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy finished schduling the instance to the new host. if if we do the compatiablity check at this point and it fails then the live migration is aborted and will not be retired. These types of late check lead to a poor user experince as unless you check the migration detial it basically looks like the migration was ignored as it start to migrate and then continuge running on the orgininal host. when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor id/product id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability of a compatibale device because we track inventories of pci devices and can query that when selecting a host. if we were to support mdev live migration in the future we would want to take the same declarative approch. 1 interospec the capability of the deivce we manage 2 create inventories of the allocatable devices and there capabilities 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces 4 have the lower level hyperviors do addtional validation if need prelive migration. this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that would be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). this might be useful as we could tag the inventory with the migration version and only might to devices with the same version > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. this would not be useful as the schduler cannot directlly connect to the compute host and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally multiple devices per host. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. opaque vendor specific stings that higher level orchestros have to pass form host to host and cant reason about are evil, when allowed they prolifroate and makes any idea of a vendor nutral abstraction and interoperablity between systems impossible to reason about. that said there is a way to make it opaue but still useful to userspace. see below > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) honestly i would much prefer if the version string was just a semver string. e.g. {major}.{minor}.{bugfix} if you do a driver/frimware update and break compatiablity with an older version bump the major version. if you add optional a feature that does not break backwards compatiablity if you migrate an older instance to the new host then just bump the minor/feature number. if you have a fix for a bug that does not change the feature set or compatiblity backwards or forwards then bump the bugfix number then the check is as simple as 1.) is the mdev type the same 2.) is the major verion the same 3.) am i going form the same version to same version or same version to newer version if all 3 are true we can migrate. e.g. 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be incompatable) 2.0.0 -> 3.0.0 (not ok chaning major version) 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay available both to the device driver and to userspace via /sys already we just need to know if version of the same mdev are compatiable so a simple semver version string which is well know in the software world at least is a clean abstration we can reuse. > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. > > Requiring management application developers to figure out this possible > compatibility based on prod specs is really unrealistic. Product specs > are typically as clear as mud, and with the suggestion we consider > different rules for different types of devices, add up to a huge amount > of complexity. This isn't something app developers should have to spend > their time figuring out. > > The suggestion that we make use of vendor proprietary helper components > is totally unacceptable. We need to be able to build a solution that > works with exclusively an open source software stack. > > IMHO there needs to be a mechanism for the kernel to report via sysfs > what versions are supported on a given device. This puts the job of > reporting compatible versions directly under the responsibility of the > vendor who writes the kernel driver for it. They are the ones with the > best knowledge of the hardware they've built and the rules around its > compatibility. yep totally agree with that statement. > > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > > device migration compatibility interface to make sure the two devices are > > indeed live migration compatible before launching the real live migration > > process to start stream copying, src device stopping and target device > > resuming. > > It is supposed that this step would not bring any performance penalty as > > -in kernel it's just a simple string decoding and comparing > > -in openstack/ovirt, it could be done by extending current function > > check_can_live_migrate_destination, along side claiming target resources.[1] that is a compute driver fucntion https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/virt/driver.py#L1261-L1278 that is called in the conductor here https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/conductor/tasks/live_migrate.py#L360-L364 if the check fails(ignoreing the fact its expensive to do an rpc to the compute host) we raise an excption that move on to the next host in the alternate host list. https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/conductor/tasks/live_migrate.py#L556-L567 by default the alternate host list is 3 https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.max_attempts so there would be a pretty high likely hood that if we only checked compatiablity at this point it would fail to migrate. realistically speaking this is too late. we can do a final safty check at this point but this should not be the first time we check compatibility. at a mimnium we would have wanted to select a host with the same mdev type first, we can do that from the info we have today but i hope i have made the point that declaritive interfacs which we can introspect without haveing opaqce vender sepecitic blob are vastly more consomable then imperitive interfaces we have to probe. form a security and packaging point of view this is better too as if i only need readonly access to sysfs instead of write access and if i dont need to package a bunch of addtion vendor tools in a continerised deployment that significantly decreases the potential attack surface. > > > > > > > > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > > > Thanks > > Yan > > > > Regards, > Daniel From ionut at fleio.com Tue Jul 14 14:00:52 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 14 Jul 2020 17:00:52 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi, Thanks for the information. I made it work by using only one attribute at that time the error was something related to the type of an attribute and I got rid of it. On Fri, Jul 10, 2020 at 8:24 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Sure, this is a minimalistic config I used for testing (watch for the > indentation issues that might happen due to copy/paste into Gmail). > >> cat ceilometer/pollsters.d/vpn-connection-dynamic-pollster.yaml >> --- >> >> - name: "dynamic_pollster.network.services.vpn.connection" >> sample_type: "gauge" >> unit: "ipsec_site_connection" >> value_attribute: "status" >> endpoint_type: "network" >> url_path: "v2.0/vpn/ipsec-site-connections" >> metadata_fields: >> - "name" >> - "vpnservice_id" >> - "description" >> - "status" >> - "peer_address" >> value_mapping: >> ACTIVE: "1" >> DOWN: "0" >> metadata_mapping: >> name: "display_name" >> default_value: 0 >> > > Then, the polling.yaml file > > cat ceilometer/polling.yaml | grep -A 3 vpnass >> - name: vpnass_pollsters >> interval: 600 >> meters: >> - dynamic_pollster.network.services.vpn.connection >> > > And last, but not least, the custom_gnocchi_resources file. > >> cat ceilometer/custom_gnocchi_resources.yaml | grep -B 2 -A 9 >> "dynamic_pollster.network.services.vpn.connection" >> - resource_type: s2svpn >> metrics: >> dynamic_pollster.network.services.vpn.connection: >> attributes: >> name: resource_metadata.name >> vpnservice_id: resource_metadata.vpnservice_id >> description: resource_metadata.description >> status: resource_metadata.status >> peer_address: resource_metadata.peer_address >> display_name: resource_metadata.display_name >> > > Bear in mind that you need to create the Gnocchi resource type. > >> gnocchi resource-type show s2svpn >> >> +--------------------------+-----------------------------------------------------------+ >> | Field | Value >> | >> >> +--------------------------+-----------------------------------------------------------+ >> | attributes/description | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/display_name | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/name | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/peer_address | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/status | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/vpnservice_id | required=False, type=uuid >> | >> | name | s2svpn >> | >> | state | active >> | >> >> +--------------------------+-----------------------------------------------------------+ >> > > What is the problem you are having? > > On Fri, Jul 10, 2020 at 10:50 AM Ionut Biru wrote: > >> Hi again, >> >> I did not manage to make it work, I cannot figure out how to connect all >> the pieces. >> >> pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ >> pipeline.yaml https://paste.xinu.at/u1E42/ >> polling.yaml https://paste.xinu.at/MZWNs/ >> gnocchi_resources.yaml https://paste.xinu.at/j3AX/ >> gnocchi_client.py in resources_update_operations >> https://paste.xinu.at/no5/ >> gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ >> Do you mind if you do a full example >> using "dynamic.network.services.vpn.connection" from >> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >> ? >> >> Or maybe you can point me to the mistakes made in my configuration? >> >> >> On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> That is the right direction. I don't know why people hard-coded the >>> initial pollsters' configs and did not document the relation between >>> Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a >>> single system, but interdependent systems to implement a monitoring >>> solution. Ceilometer is the component that gathers data/information, >>> processes, and then persists it somewhere. Gnocchi is one of the options >>> that Ceilometer can use to persist data. By default, Ceilometer creates >>> some basic configurations in Gnocchi to store data, such as some default >>> resource-types with default attributes. However, we do not need (should >>> not) rely on this default config. >>> >>> You can create and use custom resources to fit the stack to your needs. >>> This can be achieved via `gnocchi resource-type create -a >>> :: ` and >>> `gnocchi resource-type create -u >>> :: `. >>> Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), >>> you can customize the mapping of metrics to resource-types in Gnocchi. >>> >>> On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: >>> >>>> Hello again, >>>> >>>> What's the proper way to handle dynamic pollsters in gnocchi ? >>>> Right now ceilometer returns: >>>> >>>> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >>>> is not handled by Gnocchi >>>> >>>> I found >>>> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >>>> but I'm not sure if is the right direction. >>>> >>>> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >>>> >>>>> Seems to work fine now. Thanks. >>>>> >>>>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> It looks like a coding error that we left behind during a major >>>>>> refactoring that we introduced upstream. >>>>>> I created a patch for it. Can you check/review and test it? >>>>>> https://review.opendev.org/739555 >>>>>> >>>>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>>>> >>>>>>> Hi Rafael, >>>>>>> >>>>>>> I have an error and I cannot resolve it myself. >>>>>>> >>>>>>> https://paste.xinu.at/LEfdXD/ >>>>>>> >>>>>>> Do you happen to know what's wrong? >>>>>>> >>>>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>>>> >>>>>>> >>>>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> Good catch. I fixed the docs. >>>>>>>> https://review.opendev.org/#/c/739288/ >>>>>>>> >>>>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I just noticed that the example >>>>>>>>> dynamic.network.services.vpn.connection from >>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>>>> the wrong indentation. >>>>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>>>> >>>>>>>>> Now I have to see why is not polling from it >>>>>>>>> >>>>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>>>> >>>>>>>>>> Hi Rafael, >>>>>>>>>> >>>>>>>>>> I think I applied all the reviews successfully but I tried to do >>>>>>>>>> an octavia dynamic poller but I have couples of errors. >>>>>>>>>> >>>>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>>>> Error is about syntax error near name: >>>>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>>>> >>>>>>>>>> if i remove the - in front of name like this: >>>>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>>>> >>>>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would say so. We are lacking people to review and then merge >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>>> production? >>>>>>>>>>>> >>>>>>>>>>> As long as the person executing the cherry-picks, and >>>>>>>>>>> maintaining the code knows what she/he is doing, you should be safe. The >>>>>>>>>>> guys that are using this implementation (and others that I and my >>>>>>>>>>> colleagues proposed), have a few openstack components that are customized >>>>>>>>>>> with the patches/enhancements/extensions we developed so far; this means, >>>>>>>>>>> they are not using the community version, but something in-between (the >>>>>>>>>>> community releases + the patches we did). Of course, it is only possible, >>>>>>>>>>> because we are the ones creating and maintaining these codes; therefore, we >>>>>>>>>>> can assure quality for production. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Rafael, >>>>>>>>>>>> >>>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>>> >>>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>>> production? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>>>> that might be important for your use case: >>>>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll >>>>>>>>>>>>>> defer to the Ceilometer project. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru < >>>>>>>>>>>>>>>> ionut at fleio.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer >>>>>>>>>>>>>>>>> was deployed and has status active. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can get the provisioning and operating status of >>>>>>>>>>>>>>>> Octavia load balancers via the Octavia API. There is also an API endpoint >>>>>>>>>>>>>>>> that returns the full load balancer status tree [1]. >>>>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for >>>>>>>>>>>>>>>> statistics [2][3][4]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jul 14 15:41:02 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 14 Jul 2020 15:41:02 +0000 Subject: Cloud Computing Resource In-Reply-To: References: Message-ID: <20200714154102.d5zso2liey6ztmzf@yuggoth.org> On 2020-07-14 13:23:20 +0530 (+0530), supriya katal wrote: > I have checked your sites. > > https://github.com/openstacknetsdk/openstack.net/wiki/Getting-Started-With-The-OpenStack-NET-SDK [...] Contrary to its name, that does not appear to have been created by the OpenStack community. Their documentation indicates you should contact sdk-support at rackspace.com with any questions. It also looks like the most recent release for it was 4 years ago, and the most recent commit to merge in their default Git branch was from two years ago, so I would not be surprised if they're no longer actively maintaining it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From whayutin at redhat.com Tue Jul 14 15:50:48 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Tue, 14 Jul 2020 09:50:48 -0600 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 9:11 AM Bogdan Dobrelya wrote: > On 7/14/20 3:30 PM, Emilien Macchi wrote: > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > over the > > last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources > or > > improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > Python code > > into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to > > the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > well as a > > lot of bug fixes or new features; which contributed to make TripleO more > stable > > and scalable. I would like to propose him be part of the TripleO core > team. > > > > Thanks Rabi for your hard work! > > +1 > > > -- > > Emilien Macchi > > Thanks for raising this Emilien!! Thank you to Rabi for your excellent work! +1 > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.williamson at redhat.com Tue Jul 14 16:16:16 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 10:16:16 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714102129.GD25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: <20200714101616.5d3a9e75@x1.home> On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > - a src MDEV can migration to a target VF in SRIOV. > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) It's very strange to define it as opaque and then proceed to describe the contents of that opaque string. The point is that its contents are defined by the vendor driver to describe the device, driver version, and possibly metadata about the configuration of the device. One instance of a device might generate a different string from another. The string that a device produces is not necessarily the only string the vendor driver will accept, for example the driver might support backwards compatible migrations. > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. I interpret this as hand waving, ie. the first step is for management tools to make a good guess :-\ We don't seem to be willing to say that a given mdev type can only migrate to a device with that same type. There's this aggregation discussion happening separately where a base mdev type might be created or later configured to be equivalent to a different type. The vfio migration API we've defined is also not limited to mdev devices, for example we could create vendor specific quirks or hooks to provide migration support for a physical PF/VF device. Within the realm of possibility then is that we could migrate between a physical device and an mdev device, which are simply different degrees of creating a virtualization layer in front of the device. > Requiring management application developers to figure out this possible > compatibility based on prod specs is really unrealistic. Product specs > are typically as clear as mud, and with the suggestion we consider > different rules for different types of devices, add up to a huge amount > of complexity. This isn't something app developers should have to spend > their time figuring out. Agreed. > The suggestion that we make use of vendor proprietary helper components > is totally unacceptable. We need to be able to build a solution that > works with exclusively an open source software stack. I'm surprised to see this as well, but I'm not sure if Yan was really suggesting proprietary software so much as just vendor specific knowledge. > IMHO there needs to be a mechanism for the kernel to report via sysfs > what versions are supported on a given device. This puts the job of > reporting compatible versions directly under the responsibility of the > vendor who writes the kernel driver for it. They are the ones with the > best knowledge of the hardware they've built and the rules around its > compatibility. The version string discussed previously is the version string that represents a given device, possibly including driver information, configuration, etc. I think what you're asking for here is an enumeration of every possible version string that a given device could accept as an incoming migration stream. If we consider the string as opaque, that means the vendor driver needs to generate a separate string for every possible version it could accept, for every possible configuration option. That potentially becomes an excessive amount of data to either generate or manage. Am I overestimating how vendors intend to use the version string? We'd also need to consider devices that we could create, for instance providing the same interface enumeration prior to creating an mdev device to have a confidence level that the new device would be a valid target. We defined the string as opaque to allow vendor flexibility and because defining a common format is hard. Do we need to revisit this part of the discussion to define the version string as non-opaque with parsing rules, probably with separate incoming vs outgoing interfaces? Thanks, Alex From dev.faz at gmail.com Tue Jul 14 16:44:05 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 14 Jul 2020 18:44:05 +0200 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi, Am Di., 14. Juli 2020 um 02:04 Uhr schrieb Michael Johnson < johnsomor at gmail.com>: > Sorry you have run into trouble and we have missed you in the IRC channel. > Thanks for your great work and support! > Yeah, that transcript from three years ago isn't going to be much help. > Arg. > A few things we will want to know are: > 1. What version of Octavia are you using? > 3.1.0 > 2. Do you have the DNS extension to neutron enabled? > yes > 3. When it said "unable to attach port to amphora", can you provide > the full error? Was it due to a hostname mismatch error from nova? > arg, debug logs got already rotated. I will repeat my debug-session and paste the output. Any suggestions what I should do? Maybe I can already try something different? My guess is you ran into the issue where a port will not attach if the > DNS name doesn't match. Our workaround for that accidentally got > removed and re-added in https://review.opendev.org/#/c/663277/. > So, this should already be fixed in stable/rocky. Should upgrading octavia to latest stable/rocky be enough to get my amphoras working again? Replacing a vrrp_port is tricky, so I'm not surprised you ran into > some trouble. Can you please provide the controller worker log output > when doing a load balancer failover (let's not use amphora failover > here) on paste.openstack.org? You can mark it private and directly > reply to me if you have concerns about the log content. > Will provide this asap. > All this said, I have recently completely refactored the failover > flows recently. This has already merged on the master branch and > backports are in process. > Thanks a lot, Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Tue Jul 14 16:40:11 2020 From: thomas.king at gmail.com (Thomas King) Date: Tue, 14 Jul 2020 10:40:11 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: I have. That's the Triple-O docs and they don't go through the normal .conf files to explain how it works outside of Triple-O. It has some ideas but no running configurations. Tom King On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis wrote: > hi, have you checked: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html > ? > I am following this link. I only have one network, having different issues > tho ;) > > > > On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: > >> Thank you, Amy! >> >> Tom >> >> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >> >>> Hey Tom, >>> >>> Adding the OpenStack discuss list as I think you got several replies >>> from there as well. >>> >>> Thanks, >>> >>> Amy (spotz) >>> >>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>> wrote: >>> >>>> Good day, >>>> >>>> I'm bringing up a thread from June about DHCP relay with neutron >>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>> not have the plain config/neutron config to show how a regular Ironic setup >>>> would use DHCP relay. >>>> >>>> The Neutron segments docs state that I must have a unique physical >>>> network name. If my Ironic controller has a single provisioning network >>>> with a single physical network name, doesn't this prevent my use of >>>> multiple segments? >>>> >>>> Further, the segments docs state this: "The operator must ensure that >>>> every compute host that is supposed to participate in a router provider >>>> network has direct connectivity to one of its segments." (section 3 at >>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>> current docs state the same thing) >>>> This defeats the purpose of using DHCP relay, though, where the Ironic >>>> controller does *not* have direct connectivity to the remote segment. >>>> >>>> Here is a rough drawing - what is wrong with my thinking here? >>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >>>> <------> Ironic controller, provisioning network: 10.146.29.192/26 >>>> VLAN 2115 >>>> >>>> Thank you, >>>> Tom King >>>> _______________________________________________ >>>> openstack-mentoring mailing list >>>> openstack-mentoring at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>> >>> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berrange at redhat.com Tue Jul 14 16:47:22 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 14 Jul 2020 17:47:22 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: <20200714164722.GL25187@redhat.com> On Tue, Jul 14, 2020 at 10:16:16AM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. > > Am I overestimating how vendors intend to use the version string? If I'm interpreting your reply & the quoted text orrectly, the version string isn't really a version string in any normal sense of the word "version". Instead it sounds like string encoding a set of features in some arbitrary vendor specific format, which they parse and do compatibility checks on individual pieces ? One or more parts may contain a version number, but its much more than just a version. If that's correct, then I'd prefer we didn't call it a version string, instead call it a "capability string" to make it clear it is expressing a much more general concept, but... > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, ..even if the huge amount of flexibility is technically relevant from the POV of the hardware/drivers, we should consider whether management apps actually want, or can use, that level of flexibility. The task of picking which host to place a VM on has alot of factors to consider, and when there are a large number of hosts, the total amount of information to check gets correspondingly large. The placement process is also fairly performance critical. Running complex algorithmic logic to check compatibility of devices based on a arbitrary set of rules is likely to be a performance challenge. A flat list of supported strings is a much simpler thing to check as it reduces down to a simple set membership test. IOW, even if there's some complex set of device type / vendor specific rules to check for compatibility, I fear apps will ignore them and just define a very simplified list of compatible string, and ignore all the extra flexibility. I'm sure OpenStack maintainers can speak to this more, as they've put alot of work into their scheduling engine to optimize the way it places VMs largely driven from simple structured data reported from hosts. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From alex.williamson at redhat.com Tue Jul 14 17:01:48 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 11:01:48 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: <20200714110148.0471c03c@x1.home> On Tue, 14 Jul 2020 13:33:24 +0100 Sean Mooney wrote: > On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > mdev live migration is completely possible to do but i agree with Dan barrange's comments > from the point of view of openstack integration i dont see calling out to a vender sepecific > tool to be an accpetable As I replied to Dan, I'm hoping Yan was referring more to vendor specific knowledge rather than actual tools. > solutions for device compatiablity checking. the sys filesystem > that describs the mdevs that can be created shoudl also > contain the relevent infomation such > taht nova could integrate it via libvirt xml representation or directly retrive the > info from > sysfs. > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > so vf to vf migration is not possible in the general case as there is no standarised > way to transfer teh device state as part of the siorv specs produced by the pci-sig > as such there is not vender neutral way to support sriov live migration. We're not talking about a general case, we're talking about physical devices which have vfio wrappers or hooks with device specific knowledge in order to support the vfio migration interface. The point is that a discussion around vfio device migration cannot be limited to mdev devices. > > > - a src MDEV can migration to a target VF in SRIOV. > that also makes this unviable > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > well actully that is already too late really. ideally we would want to do this compaiablity > check much sooneer to avoid the migration failing. in an openstack envionment at least > by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy > finished schduling the instance to the new host. if if we do the compatiablity check at this point > and it fails then the live migration is aborted and will not be retired. These types of late check lead to a > poor user experince as unless you check the migration detial it basically looks like the migration was ignored > as it start to migrate and then continuge running on the orgininal host. > > when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor id/product > id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability of a > compatibale device because we track inventories of pci devices and can query that when selecting a host. > > if we were to support mdev live migration in the future we would want to take the same declarative approch. > 1 interospec the capability of the deivce we manage > 2 create inventories of the allocatable devices and there capabilities > 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces > 4 have the lower level hyperviors do addtional validation if need prelive migration. > > this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that would > be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. I think this is reading a whole lot into the phrase "last step". We want to make the information available for a management engine to consume as needed to make informed decisions regarding likely compatible target devices. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > this might be useful as we could tag the inventory with the migration version and only might to > devices with the same version Is cross version compatibility something that you'd consider using? > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > this would not be useful as the schduler cannot directlly connect to the compute host > and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally > multiple devices per host. Seems similar to Dan's requirement, looks like the 'read for version, write for compatibility' test idea isn't really viable. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > opaque vendor specific stings that higher level orchestros have to pass form host > to host and cant reason about are evil, when allowed they prolifroate and > makes any idea of a vendor nutral abstraction and interoperablity between systems > impossible to reason about. that said there is a way to make it opaue but still useful > to userspace. see below > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > honestly i would much prefer if the version string was just a semver string. > e.g. {major}.{minor}.{bugfix} > > if you do a driver/frimware update and break compatiablity with an older version bump the > major version. > > if you add optional a feature that does not break backwards compatiablity if you migrate > an older instance to the new host then just bump the minor/feature number. > > if you have a fix for a bug that does not change the feature set or compatiblity backwards or > forwards then bump the bugfix number > > then the check is as simple as > 1.) is the mdev type the same > 2.) is the major verion the same > 3.) am i going form the same version to same version or same version to newer version > > if all 3 are true we can migrate. > e.g. > 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) > 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be > incompatable) > 2.0.0 -> 3.0.0 (not ok chaning major version) > 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) What's the value of the bugfix field in this scheme? The simplicity is good, but is it too simple. It's not immediately clear to me whether all features can be hidden behind a minor version. For instance, if we have an mdev device that supports this notion of aggregation, which is proposed as a solution to the problem that physical hardware might support lots and lots of assignable interfaces which can be combined into arbitrary sets for mdev devices, making it impractical to expose an mdev type for every possible enumeration of assignable interfaces within a device. We therefore expose a base type where the aggregation is built later. This essentially puts us in a scenario where even within an mdev type running on the same driver, there are devices that are not directly compatible with each other. > we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay > available both to the device driver and to userspace via /sys already we just need to know if version of > the same mdev are compatiable so a simple semver version string which is well know in the software world > at least is a clean abstration we can reuse. This presumes there's no cross device migration. An mdev type can only be migrated to the same mdev type, all of the devices within that type have some based compatibility, a phsyical device can only be migrated to the same physical device. In the latter case what defines the type? If it's a PCI device, is it only vendor:device IDs? What about revision? What about subsystem IDs? What about possibly an onboard ROM or internal firmware? The information may be available, but which things are relevant to migration? We already see desires to allow migration between physical and mdev, but also to expose mdev types that might be composable to be compatible with other types. Thanks, Alex From sgolovat at redhat.com Tue Jul 14 17:03:55 2020 From: sgolovat at redhat.com (Sergii Golovatiuk) Date: Tue, 14 Jul 2020 19:03:55 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Hi, +1. Thank you Rabi! вт, 14 июл. 2020 г. в 17:53, Wesley Hayutin : > > > On Tue, Jul 14, 2020 at 9:11 AM Bogdan Dobrelya > wrote: > >> On 7/14/20 3:30 PM, Emilien Macchi wrote: >> > Hi folks, >> > >> > Rabi has proved deep technical understanding on the TripleO components >> over the >> > last years. >> > Initially as a major maintainer of the Heat project and then a regular >> > contributor to TripleO, he got involved at different levels: >> > - Optimization of the Heat templates, to reduce the number of resources >> or >> > improve them to make it faster and more efficient at scale. >> > - Migration of the Mistral workflows into native Ansible modules and >> Python code >> > into tripleo-common, with end-to-end expertise. >> > - Regular contributions to the container tooling integration. >> > >> > Being involved on the mailing-list and IRC channels, Rabi is always >> helpful to >> > the community and here to help. >> > He has provided thorough reviews in principal components on TripleO as >> well as a >> > lot of bug fixes or new features; which contributed to make TripleO >> more stable >> > and scalable. I would like to propose him be part of the TripleO core >> team. >> > >> > Thanks Rabi for your hard work! >> >> +1 >> >> > -- >> > Emilien Macchi >> >> Thanks for raising this Emilien!! Thank you to Rabi for your excellent > work! > +1 > >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Irc #bogdando >> >> >> -- Sergii Golovatiuk Senior Software Developer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgilbert at redhat.com Tue Jul 14 17:19:46 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Tue, 14 Jul 2020 18:19:46 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: <20200714171946.GL2728@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. (As I've said in the previous discussion, off one of the patch series) My view is it makes sense to have a half-way house on the opaqueness of this string; I'd expect to have an ID and version that are human readable, maybe a device ID/name that's human interpretable and then a bunch of other cruft that maybe device/vendor/version specific. I'm thinking that we want to be able to report problems and include the string and the user to be able to easily identify the device that was complaining and notice a difference in versions, and perhaps also use it in compatibility patterns to find compatible hosts; but that does get tricky when it's a 'ask the device if it's compatible'. Dave > > > (2) backgrounds > > > > > > The reason we hope the migration_version string is opaque to the userspace > > > is that it is hard to generalize standard comparing fields and comparing > > > methods for different devices from different vendors. > > > Though userspace now could still do a simple string compare to check if > > > two devices are compatible, and result should also be right, it's still > > > too limited as it excludes the possible candidate whose migration_version > > > string fails to be equal. > > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > > with another MDEV with mdev_type_3, aggregator count 1, even their > > > migration_version strings are not equal. > > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > > > besides that, driver version + configured resources are all elements demanding > > > to take into account. > > > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > > in a simple reading from source side and writing for test in the target side way. > > > > > > > > > we then think the device compatibility issues for live migration with assigned > > > devices can be divided into two steps: > > > a. management tools filter out possible migration target devices. > > > Tags could be created according to info from product specification. > > > we think openstack/ovirt may have vendor proprietary components to create > > > those customized tags for each product from each vendor. > > > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > > search target vGPU are like: > > > a tag for compatible parent PCI IDs, > > > a tag for a range of gvt driver versions, > > > a tag for a range of mdev type + aggregator count > > > > > > for NVMe VF, the tags to search target VF may be like: > > > a tag for compatible PCI IDs, > > > a tag for a range of driver versions, > > > a tag for URL of configured remote storage. > > I interpret this as hand waving, ie. the first step is for management > tools to make a good guess :-\ We don't seem to be willing to say that > a given mdev type can only migrate to a device with that same type. > There's this aggregation discussion happening separately where a base > mdev type might be created or later configured to be equivalent to a > different type. The vfio migration API we've defined is also not > limited to mdev devices, for example we could create vendor specific > quirks or hooks to provide migration support for a physical PF/VF > device. Within the realm of possibility then is that we could migrate > between a physical device and an mdev device, which are simply > different degrees of creating a virtualization layer in front of the > device. > > > Requiring management application developers to figure out this possible > > compatibility based on prod specs is really unrealistic. Product specs > > are typically as clear as mud, and with the suggestion we consider > > different rules for different types of devices, add up to a huge amount > > of complexity. This isn't something app developers should have to spend > > their time figuring out. > > Agreed. > > > The suggestion that we make use of vendor proprietary helper components > > is totally unacceptable. We need to be able to build a solution that > > works with exclusively an open source software stack. > > I'm surprised to see this as well, but I'm not sure if Yan was really > suggesting proprietary software so much as just vendor specific > knowledge. > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. > > Am I overestimating how vendors intend to use the version string? > > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From johnsomor at gmail.com Tue Jul 14 18:02:10 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 14 Jul 2020 11:02:10 -0700 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi again, So looking at the patch in question, yes, upgrading to the latest version of Octavia for Rocky, 3.2.2 will resolve the DNS issue going forward. It was originally included in the 3.2.0 release for Rocky, but we would recommend updating to the latest Rocky release, 3.2.2. Michael On Tue, Jul 14, 2020 at 9:44 AM Fabian Zimmermann wrote: > > Hi, > > Am Di., 14. Juli 2020 um 02:04 Uhr schrieb Michael Johnson : >> >> Sorry you have run into trouble and we have missed you in the IRC channel. > > Thanks for your great work and support! > >> >> Yeah, that transcript from three years ago isn't going to be much help. > > Arg. > >> >> A few things we will want to know are: >> 1. What version of Octavia are you using? > > > 3.1.0 > >> >> 2. Do you have the DNS extension to neutron enabled? > > > yes > >> >> 3. When it said "unable to attach port to amphora", can you provide >> the full error? Was it due to a hostname mismatch error from nova? > > > arg, debug logs got already rotated. I will repeat my debug-session and paste the output. > > Any suggestions what I should do? Maybe I can already try something different? > >> My guess is you ran into the issue where a port will not attach if the >> DNS name doesn't match. Our workaround for that accidentally got >> removed and re-added in https://review.opendev.org/#/c/663277/. > > > So, this should already be fixed in stable/rocky. Should upgrading octavia to latest stable/rocky be enough to get my amphoras working again? > >> Replacing a vrrp_port is tricky, so I'm not surprised you ran into >> some trouble. Can you please provide the controller worker log output >> when doing a load balancer failover (let's not use amphora failover >> here) on paste.openstack.org? You can mark it private and directly >> reply to me if you have concerns about the log content. > > > Will provide this asap. > >> >> All this said, I have recently completely refactored the failover >> flows recently. This has already merged on the master branch and >> backports are in process. > > > Thanks a lot, > > Fabian From alex.williamson at redhat.com Tue Jul 14 20:47:15 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 14:47:15 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714164722.GL25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714164722.GL25187@redhat.com> Message-ID: <20200714144715.0ef70074@x1.home> On Tue, 14 Jul 2020 17:47:22 +0100 Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 10:16:16AM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 11:21:29 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > driver and is completely opaque to the userspace. > > > > for a Intel vGPU, string format can be defined like > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > for a QAT VF, it may be > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > It's very strange to define it as opaque and then proceed to describe > > the contents of that opaque string. The point is that its contents > > are defined by the vendor driver to describe the device, driver version, > > and possibly metadata about the configuration of the device. One > > instance of a device might generate a different string from another. > > The string that a device produces is not necessarily the only string > > the vendor driver will accept, for example the driver might support > > backwards compatible migrations. > > > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > > what versions are supported on a given device. This puts the job of > > > reporting compatible versions directly under the responsibility of the > > > vendor who writes the kernel driver for it. They are the ones with the > > > best knowledge of the hardware they've built and the rules around its > > > compatibility. > > > > The version string discussed previously is the version string that > > represents a given device, possibly including driver information, > > configuration, etc. I think what you're asking for here is an > > enumeration of every possible version string that a given device could > > accept as an incoming migration stream. If we consider the string as > > opaque, that means the vendor driver needs to generate a separate > > string for every possible version it could accept, for every possible > > configuration option. That potentially becomes an excessive amount of > > data to either generate or manage. > > > > Am I overestimating how vendors intend to use the version string? > > If I'm interpreting your reply & the quoted text orrectly, the version > string isn't really a version string in any normal sense of the word > "version". > > Instead it sounds like string encoding a set of features in some arbitrary > vendor specific format, which they parse and do compatibility checks on > individual pieces ? One or more parts may contain a version number, but > its much more than just a version. > > If that's correct, then I'd prefer we didn't call it a version string, > instead call it a "capability string" to make it clear it is expressing > a much more general concept, but... I'd agree with that. The intent of the previous proposal was to provide and interface for reading a string and writing a string back in where the result of that write indicated migration compatibility with the device. So yes, "version" is not the right term. > > We'd also need to consider devices that we could create, for instance > > providing the same interface enumeration prior to creating an mdev > > device to have a confidence level that the new device would be a valid > > target. > > > > We defined the string as opaque to allow vendor flexibility and because > > defining a common format is hard. Do we need to revisit this part of > > the discussion to define the version string as non-opaque with parsing > > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > ..even if the huge amount of flexibility is technically relevant from the > POV of the hardware/drivers, we should consider whether management apps > actually want, or can use, that level of flexibility. > > The task of picking which host to place a VM on has alot of factors to > consider, and when there are a large number of hosts, the total amount > of information to check gets correspondingly large. The placement > process is also fairly performance critical. > > Running complex algorithmic logic to check compatibility of devices > based on a arbitrary set of rules is likely to be a performance > challenge. A flat list of supported strings is a much simpler > thing to check as it reduces down to a simple set membership test. > > IOW, even if there's some complex set of device type / vendor specific > rules to check for compatibility, I fear apps will ignore them and > just define a very simplified list of compatible string, and ignore > all the extra flexibility. There's always the "try it and see if it works" interface, which is essentially what we have currently. With even a simple version of what we're trying to accomplish here, there's still a risk that a management engine might rather just ignore it and restrict themselves to 1:1 mdev type matches, with or without knowing anything about the vendor driver version, relying on the migration to fail quickly if the devices are incompatible. If the complexity of the interface makes it too complicated or time consuming to provide sufficient value above such an algorithm, there's not much point to implementing it, which is why Yan has included so many people in this discussion. > I'm sure OpenStack maintainers can speak to this more, as they've put > alot of work into their scheduling engine to optimize the way it places > VMs largely driven from simple structured data reported from hosts. I think we've weeded out that our intended approach is not worthwhile, testing a compatibility string at a device is too much overhead, we need to provide enough information to the management engine to predict the response without interaction beyond the initial capability probing. As you've identified above, we're really dealing with more than a simple version, we need to construct a compatibility string and we need to start defining what goes into that. The first item seems to be that we're defining compatibility relative to a vfio migration stream, vfio devices have a device API, such as vfio-pci, so the first attribute might simply define the device API. Once we have a class of devices we might then be able to use bus specific attributes, for example the PCI vendor and device ID (other bus types TBD). We probably also need driver version numbers, so we need to include both the driver name as well as version major and minor numbers. Rules need to be put in place around what we consider to be viable version matches, potentially as Sean described. For example, does the major version require a match? Do we restrict to only formward, ie. increasing, minor number matches within that major verison? Do we then also have section that includes any required device attributes to result in a compatible device. This would be largely focused on mdev, but I wouldn't rule out others. For example if an aggregation parameter is required to maintain compatibility, we'd want to specify that as a required attribute. So maybe we end up with something like: { "device_api": "vfio-pci", "vendor": "vendor-driver-name", "version": { "major": 0, "minor": 1 }, "vfio-pci": { // Based on above device_api "vendor": 0x1234, // Values for the exposed device "device": 0x5678, // Possibly further parameters for a more specific match } "mdev_attrs": [ { "attribute0": "VALUE" } ] } The sysfs interface would return an array containing one or more of these for each device supported. I'm trying to account for things like aggregation via the mdev_attrs section, but I haven't really put it all together yet. I think Intel folks want to be able to say mdev type foo-3 is compatible with mdev type foo-1 so long as foo-1 is created with an aggregation attribute value of 3, but I expect both foo-1 and foo-3 would have the same user visible PCI vendor:device IDs If we use mdev type rather than the resulting device IDs, then we introduce an barrier to phys<->mdev migration. We could specify the subsystem values though, for example foo-1 might correspond to subsystem IDs 8086:0001 and foo3 8086:0003, then we can specify that creating an foo-1 from this device doesn't require any attributes, but creating a foo-3 does. I'm nervous how that scales though. NB. I'm also considering how portions of this might be compatible with mdevctl such that we could direct mdevctl to create a compatible device using information from this compatibility interface. Thanks, Alex From kevin at cloudnull.com Wed Jul 15 03:52:22 2020 From: kevin at cloudnull.com (Carter, Kevin) Date: Tue, 14 Jul 2020 22:52:22 -0500 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Absolutely, +1 On Tue, Jul 14, 2020 at 08:35 Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > > -- > Emilien Macchi > -- Kevin Carter IRC: Cloudnull -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Wed Jul 15 06:26:13 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 15 Jul 2020 08:26:13 +0200 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: I have deployed that with tripleO, but now we are recabling and redeploying it. So once I have it running I can share my configs, just name which you want :) On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: > I have. That's the Triple-O docs and they don't go through the normal > .conf files to explain how it works outside of Triple-O. It has some ideas > but no running configurations. > > Tom King > > On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis > wrote: > >> hi, have you checked: >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >> ? >> I am following this link. I only have one network, having different >> issues tho ;) >> >> >> >> On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: >> >>> Thank you, Amy! >>> >>> Tom >>> >>> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >>> >>>> Hey Tom, >>>> >>>> Adding the OpenStack discuss list as I think you got several replies >>>> from there as well. >>>> >>>> Thanks, >>>> >>>> Amy (spotz) >>>> >>>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>>> wrote: >>>> >>>>> Good day, >>>>> >>>>> I'm bringing up a thread from June about DHCP relay with neutron >>>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>>> not have the plain config/neutron config to show how a regular Ironic setup >>>>> would use DHCP relay. >>>>> >>>>> The Neutron segments docs state that I must have a unique physical >>>>> network name. If my Ironic controller has a single provisioning network >>>>> with a single physical network name, doesn't this prevent my use of >>>>> multiple segments? >>>>> >>>>> Further, the segments docs state this: "The operator must ensure that >>>>> every compute host that is supposed to participate in a router provider >>>>> network has direct connectivity to one of its segments." (section 3 at >>>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>>> current docs state the same thing) >>>>> This defeats the purpose of using DHCP relay, though, where the Ironic >>>>> controller does *not* have direct connectivity to the remote segment. >>>>> >>>>> Here is a rough drawing - what is wrong with my thinking here? >>>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP >>>>> relay <------> Ironic controller, provisioning network: >>>>> 10.146.29.192/26 VLAN 2115 >>>>> >>>>> Thank you, >>>>> Tom King >>>>> _______________________________________________ >>>>> openstack-mentoring mailing list >>>>> openstack-mentoring at lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>>> >>>> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michele at acksyn.org Wed Jul 15 06:28:59 2020 From: michele at acksyn.org (Michele Baldessari) Date: Wed, 15 Jul 2020 08:28:59 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: <20200715062859.GA2712@holtby.localdomain> +1 On Tue, Jul 14, 2020 at 10:52:22PM -0500, Carter, Kevin wrote: > Absolutely, +1 > > On Tue, Jul 14, 2020 at 08:35 Emilien Macchi wrote: > > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > > over the last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources or > > improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > > Python code into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > > helpful to the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > > well as a lot of bug fixes or new features; which contributed to make > > TripleO more stable and scalable. I would like to propose him be part of > > the TripleO core team. > > > > Thanks Rabi for your hard work! > > > > -- > > Emilien Macchi > > > -- > Kevin Carter > IRC: Cloudnull -- Michele Baldessari C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D From cjeanner at redhat.com Wed Jul 15 11:01:47 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 15 Jul 2020 13:01:47 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Of course +1! On 7/14/20 3:30 PM, Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources > or improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From johfulto at redhat.com Wed Jul 15 12:00:40 2020 From: johfulto at redhat.com (John Fulton) Date: Wed, 15 Jul 2020 08:00:40 -0400 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 I thought he was already a core. On Wed, Jul 15, 2020 at 7:05 AM Cédric Jeanneret wrote: > Of course +1! > > On 7/14/20 3:30 PM, Emilien Macchi wrote: > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > > over the last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources > > or improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > > Python code into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > > helpful to the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > > well as a lot of bug fixes or new features; which contributed to make > > TripleO more stable and scalable. I would like to propose him be part of > > the TripleO core team. > > > > Thanks Rabi for your hard work! > > -- > > Emilien Macchi > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Wed Jul 15 12:21:28 2020 From: zigo at debian.org (Thomas Goirand) Date: Wed, 15 Jul 2020 14:21:28 +0200 Subject: Floating IP's for routed networks In-Reply-To: <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: Sending the message again with the correct From, as I'm not subscribed to the list with the other mailbox. On 7/15/20 2:13 PM, Thomas Goirand wrote: > Hi Ryan, > > If you don't mind, I'm adding the openstack-discuss list in the loop, as > this topic may be of interest to others. > > For mailing list readers, I'm trying to implement this: > https://review.opendev.org/#/c/669395/ > but I'm having some difficulties. > > I did a bit of investigation with some added LOG.info() in the code. > > When doing: > >> openstack subnet create vm-fip \ >> --subnet-range 10.66.20.0/24 \ >> --service-type 'network:routed' \ >> --service-type 'network:floatingip' \ >> --network multisegment1 > > Here's where neutron-api crashes. in db/ipam_backend_mixin.py: > > def _validate_segment(self, context, network_id, segment_id, > action=None, > old_segment_id=None): > # TODO(tidwellr) Create and use a constant for the service type > segments = subnet_obj.Subnet.get_subnet_segment_ids( > context, network_id, filtered_service_type='network:routed') > > associated_segments = set(segments) > if None in associated_segments and len(associated_segments) > 1: > raise segment_exc.SubnetsNotAllAssociatedWithSegments( > network_id=network_id) > > SubnetsNotAllAssociatedWithSegments() is raised, as you must already > guessed. Here's the values... > > associated_segments is an array containing 3 values: 2 being the IDs of > the segments I added previously, the 3rd one being None. This test is > then matched. Where is that None value coming from? Is this the new > subnet I'm trying to add? Maybe the > filtered_service_type='network:routed' in the call: > subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? > > Printing the SQL query that is checked shows: > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN > (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id > FROM subnet_service_types > WHERE subnets.network_id = %(network_id_2)s AND > subnet_service_types.subnet_id = subnets.id AND > subnet_service_types.service_type = %(service_type_1)s) > > though when doing by hand: > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > the db has only 2 subnets, so it looks like the floating-ip subnet got > added before the check, and is then removed when the above test fails. > > So I just removed the raise, and could add the subnet I wanted, but > that's obviously not a long term solution. > > Your thoughts? > > Another problem that I'm having, is that neutron-bgp-dragent is not > receiving (or processing) the messages from neutron-rpc-server. I've > enabled DEBUG mode for oslo_messaging, and found out that when dr-agent > starts and prints "Agent has just been revived. Scheduling full sync", > it does send a message to neutron-rpc-server, which is replied, but it > doesn't look like dr-agent processes the return message in its reply > queue, and then prints in the logs: "imeout in RPC method > get_bgp_speakers. Waiting for 17 seconds before next attempt. If the > server is not down, consider increasing the rpc_response_timeout option > as Neutron server(s) may be overloaded and unable to respond quickly > enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting > for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is > weird is that a few times (rarely), it worked, and the agent gets the reply. > > What should I do to investigate further? > > Cheers, > > Thomas Goirand (zigo) > From alex.williamson at redhat.com Tue Jul 14 20:59:48 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 14:59:48 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714171946.GL2728@work-vm> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> Message-ID: <20200714145948.17b95eb3@x1.home> On Tue, 14 Jul 2020 18:19:46 +0100 "Dr. David Alan Gilbert" wrote: > * Alex Williamson (alex.williamson at redhat.com) wrote: > > On Tue, 14 Jul 2020 11:21:29 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > hi folks, > > > > we are defining a device migration compatibility interface that helps upper > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > live migration compatible. > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > e.g. we could use it to check whether > > > > - a src MDEV can migrate to a target MDEV, > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > if one device is able to migrate to another device before triggering a real > > > > live migration procedure. > > > > we are not sure if this interface is of value or help to you. please don't > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > (1) interface definition > > > > The interface is defined in below way: > > > > > > > > __ userspace > > > > /\ \ > > > > / \write > > > > / read \ > > > > ________/__________ ___\|/_____________ > > > > | migration_version | | migration_version |-->check migration > > > > --------------------- --------------------- compatibility > > > > device A device B > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > userspace tools read the migration_version as a string from the source device, > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > - any one of the two devices does not have a migration_version attribute > > > > - error when reading from migration_version attribute of one device > > > > - error when writing migration_version string of one device to > > > > migration_version attribute of the other device > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > driver and is completely opaque to the userspace. > > > > for a Intel vGPU, string format can be defined like > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > for a QAT VF, it may be > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > It's very strange to define it as opaque and then proceed to describe > > the contents of that opaque string. The point is that its contents > > are defined by the vendor driver to describe the device, driver version, > > and possibly metadata about the configuration of the device. One > > instance of a device might generate a different string from another. > > The string that a device produces is not necessarily the only string > > the vendor driver will accept, for example the driver might support > > backwards compatible migrations. > > (As I've said in the previous discussion, off one of the patch series) > > My view is it makes sense to have a half-way house on the opaqueness of > this string; I'd expect to have an ID and version that are human > readable, maybe a device ID/name that's human interpretable and then a > bunch of other cruft that maybe device/vendor/version specific. > > I'm thinking that we want to be able to report problems and include the > string and the user to be able to easily identify the device that was > complaining and notice a difference in versions, and perhaps also use > it in compatibility patterns to find compatible hosts; but that does > get tricky when it's a 'ask the device if it's compatible'. In the reply I just sent to Dan, I gave this example of what a "compatibility string" might look like represented as json: { "device_api": "vfio-pci", "vendor": "vendor-driver-name", "version": { "major": 0, "minor": 1 }, "vfio-pci": { // Based on above device_api "vendor": 0x1234, // Values for the exposed device "device": 0x5678, // Possibly further parameters for a more specific match }, "mdev_attrs": [ { "attribute0": "VALUE" } ] } Are you thinking that we might allow the vendor to include a vendor specific array where we'd simply require that both sides have matching fields and values? ie. "vendor_fields": [ { "unknown_field0": "unknown_value0" }, { "unknown_field1": "unknown_value1" }, ] We could certainly make that part of the spec, but I can't really figure the value of it other than to severely restrict compatibility, which the vendor could already do via the version.major value. Maybe they'd want to put a build timestamp, random uuid, or source sha1 into such a field to make absolutely certain compatibility is only determined between identical builds? Thanks, Alex From smooney at redhat.com Tue Jul 14 21:15:33 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 14 Jul 2020 22:15:33 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714110148.0471c03c@x1.home> Message-ID: <8ef6f52dd7e03d19c7d862350f2d1ecf070f1d63.camel@redhat.com> resending with full cc list since i had this typed up i would blame my email provier but my email client does not seam to like long cc lists. we probably want to continue on alex's thread to not split the disscusion. but i have responed inline with some example of how openstack schdules and what i ment by different mdev_types On Tue, 2020-07-14 at 20:29 +0100, Sean Mooney wrote: > On Tue, 2020-07-14 at 11:01 -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 13:33:24 +0100 > > Sean Mooney wrote: > > > > > On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > mdev live migration is completely possible to do but i agree with Dan barrange's comments > > > from the point of view of openstack integration i dont see calling out to a vender sepecific > > > tool to be an accpetable > > > > As I replied to Dan, I'm hoping Yan was referring more to vendor > > specific knowledge rather than actual tools. > > > > > solutions for device compatiablity checking. the sys filesystem > > > that describs the mdevs that can be created shoudl also > > > contain the relevent infomation such > > > taht nova could integrate it via libvirt xml representation or directly retrive the > > > info from > > > sysfs. > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > so vf to vf migration is not possible in the general case as there is no standarised > > > way to transfer teh device state as part of the siorv specs produced by the pci-sig > > > as such there is not vender neutral way to support sriov live migration. > > > > We're not talking about a general case, we're talking about physical > > devices which have vfio wrappers or hooks with device specific > > knowledge in order to support the vfio migration interface. The point > > is that a discussion around vfio device migration cannot be limited to > > mdev devices. > > ok upstream in openstack at least we do not plan to support generic livemigration > for passthough devivces. we cheat with network interfaces since in generaly operating > systems handel hotplug of a nic somewhat safely so wehre no abstraction layer like > an mdev is present or a macvtap device we hot unplug the nic before the migration > and attach a new one after. for gpus or crypto cards this likely would not be viable > since you can bond generic hardware devices to hide the removal and readdtion of a generic > pci device. we were hoping that there would be a convergenca around MDEVs as a way to provide > that abstraction going forward for generic device or some other new mechanisum in the future. > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > that also makes this unviable > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > > well actully that is already too late really. ideally we would want to do this compaiablity > > > check much sooneer to avoid the migration failing. in an openstack envionment at least > > > by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy > > > finished schduling the instance to the new host. if if we do the compatiablity check at this point > > > and it fails then the live migration is aborted and will not be retired. These types of late check lead to a > > > poor user experince as unless you check the migration detial it basically looks like the migration was ignored > > > as it start to migrate and then continuge running on the orgininal host. > > > > > > when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor > > > id/product > > > id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability > > > of > > > a > > > compatibale device because we track inventories of pci devices and can query that when selecting a host. > > > > > > if we were to support mdev live migration in the future we would want to take the same declarative approch. > > > 1 interospec the capability of the deivce we manage > > > 2 create inventories of the allocatable devices and there capabilities > > > 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces > > > 4 have the lower level hyperviors do addtional validation if need prelive migration. > > > > > > this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that > > > would > > > be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. > > > > I think this is reading a whole lot into the phrase "last step". We > > want to make the information available for a management engine to > > consume as needed to make informed decisions regarding likely > > compatible target devices. > > well openstack as a management engin has 3 stages for schdule and asignment,. > in respocne to a live migration request the api does minimal valaidation then hand the task off to the conductor > service > ot orchestrate. the conductor invokes an rpc to the schduler service which makes a rest call to the plamcent service. > the placment cervice generate a set of allocation candiate for host based on qunataive and qulaitivly > queries agains an abstract resouce provider tree model of the hosts. > currently device pasthough is not modeled in placment so plamcnet is basicaly returning a set of host that have enough > cpu ram and disk for the instance. in the spacial of vGPU they technically are modelled in placement but not in a way > that would gurarentee compatiablity for migration. a generic pci device request is haneled in the second phase of > schduling called filtering and weighing. in this pahse the nova schuleer apply a series of filter to the list of host > returned by plamcnet to assert things like anit afintiy, tenant isolation or in the case of this converation nuam > affintiy and pci device avaiablity. when we have filtered the posible set of host down to X number we weigh the > listing > to select an optimal host and set of alternitive hosts. we then enter the code that this mail suggest modfiying which > does an rpc call to the destiation host form teh conductor to have it assert compatiablity which internaly calls back > to > the sourc host. > > so my point is we have done a lot of work by the time we call check_can_live_migrate_destination and failing > at this point is considerd quite a late failure but its still better then failing when qemu actully tries to migrate. > in general we would prefer to move compatiablity check as early in that workflow as possible but to be fair we dont > actully check cpu model compatiablity until check_can_live_migrate_destination. > https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/virt/libvirt/driver.py#L8325-L8331 > > if we needed too we could read the version string on the source and write the version string on the dest at this > point. > doing so however would be considerd, inelegant, we have found this does not scale as the first copmpatabilty check. > for cpu for example there are way to filter hosts by groups sets fo host with the same cpu or filtering on cpu feature > flags that happen in the placment or filter stage both of which are very early and cheap to do at runtime. > > the "read for version, write for compatibility" workflow could be used as a final safe check if required but > probing for compatibility via writes is basicaly considered an anti patteren in openstack. we try to always > assert compatibility by reading avaiable info and asserting requirement over it not testing to see if it works. > > this has come up in the past in the context of virtio feature flag where the idea of spawning an instrance or trying > to add a virtio port to ovs dpdk that reqested a specific feature flag was rejected as unacceptable from a performance > and security point of view. > > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > this might be useful as we could tag the inventory with the migration version and only might to > > > devices with the same version > > > > Is cross version compatibility something that you'd consider using? > > yes but it would depend on what cross version actully ment. > > the version of an mdev is not something we would want to be exposed to endusers. > it would be a security risk to do so as the version sting would potentaily allow the untrused user > to discover if a device has an unpatch vulnerablity. as a result in the context of live migration > we can only support cross verion compatiabilyt if the device in the guest does not alter as > part of the migration and the behavior does not change. > > going form version 1.0 with feature X to verions 1.1 with feature X and Y but only X enabled would > be fine. going gorm 1.0 to 2.0 where thre is only feature Y would not be ok. > being abstract makes it a little harder to readabout but i guess i would sumerisei if its > transparent to the guest for the lifetime of the qemu process then its ok for the backing version to change. > if a vm is rebooted its also ok fo the vm to pick up feature Y form the 1.1 device although at that point > it could not be migrated back to the 1.0 host as it now has feature X and Y and 1.0 only has X so that woudl be > an obserable change if it was drop as a reult of the live migration. > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > this would not be useful as the schduler cannot directlly connect to the compute host > > > and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally > > > multiple devices per host. > > > > Seems similar to Dan's requirement, looks like the 'read for version, > > write for compatibility' test idea isn't really viable. > > its ineffiecnt and we have reject adding such test in the case of virtio-feature flag compatiabilty > in the past, so its more an option of last resourt if we have no other way to support compatiablity > checking. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > > opaque vendor specific stings that higher level orchestros have to pass form host > > > to host and cant reason about are evil, when allowed they prolifroate and > > > makes any idea of a vendor nutral abstraction and interoperablity between systems > > > impossible to reason about. that said there is a way to make it opaue but still useful > > > to userspace. see below > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > honestly i would much prefer if the version string was just a semver string. > > > e.g. {major}.{minor}.{bugfix} > > > > > > if you do a driver/frimware update and break compatiablity with an older version bump the > > > major version. > > > > > > if you add optional a feature that does not break backwards compatiablity if you migrate > > > an older instance to the new host then just bump the minor/feature number. > > > > > > if you have a fix for a bug that does not change the feature set or compatiblity backwards or > > > forwards then bump the bugfix number > > > > > > then the check is as simple as > > > 1.) is the mdev type the same > > > 2.) is the major verion the same > > > 3.) am i going form the same version to same version or same version to newer version > > > > > > if all 3 are true we can migrate. > > > e.g. > > > 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) > > > 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be > > > incompatable) > > > 2.0.0 -> 3.0.0 (not ok chaning major version) > > > 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) > > > > What's the value of the bugfix field in this scheme? > > its not require but really its for a non visable chagne form a feature standpoint. > a rather contrived example but if it was quadratic to inital a set of queues or device bufferes > in 1.0.0 and you made it liniar in 1.0.1 that is a performace improvment in the device intialisation time > which is great but it would not affect the feature set or compatiablity in any way. you could call it > a feature but its really just an internal change but you might want to still bump the version number. > > > > The simplicity is good, but is it too simple. It's not immediately > > clear to me whether all features can be hidden behind a minor version. > > For instance, if we have an mdev device that supports this notion of > > aggregation, which is proposed as a solution to the problem that > > physical hardware might support lots and lots of assignable interfaces > > which can be combined into arbitrary sets for mdev devices, making it > > impractical to expose an mdev type for every possible enumeration of > > assignable interfaces within a device. > > so this is a modeling problem and likely a limitation of the current way an mdev_type is exposed. > stealing some linux doc eamples > > > |- [parent physical device] > |--- Vendor-specific-attributes [optional] > |--- [mdev_supported_types] > | |--- [] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > > you could adress this in 1 of at least 3 ways. > 1.) mdev type for each enmartion which is fine for 1-2 variabley othersize its a combinitroial explotions. > 2.) report each of the consomable sub componetns as an mdev type and create mupltipel mdevs and assign them to the vm. > 3.) provider an api to dynamically compose mdevs types which staticaly partion the reqouese and can then be consomed > perferably embeding the resouce infomation in the description filed in a huma/machince readable form. > > 2 and 3 woudl work well with openstack however they both have there challanges > 1 doesnt really work for anyone out side of a demo. > > We therefore expose a base type > > where the aggregation is built later. This essentially puts us in a > > scenario where even within an mdev type running on the same driver, > > there are devices that are not directly compatible with each other. > > > > > we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay > > > available both to the device driver and to userspace via /sys already we just need to know if version of > > > the same mdev are compatiable so a simple semver version string which is well know in the software world > > > at least is a clean abstration we can reuse. > > > > This presumes there's no cross device migration. > > no but it does assume no cross mdev_type migration. > it assuems that nvida_mdev_type_x on host 1 is the same as nvida_mdev_type_x on host 2. > if the parent device differese but support the same mdev type we are asserting that they > should be compatiable or a differnt mdev_type name should be used on each device. > > so we are presuming the mdev type cant change as part of a live migration and if the type > was to change it would no longer be a live migration operation it would be something else. > that is based on the premis that changing the mdev type would change the capabilities of the mdev > > > An mdev type can only > > be migrated to the same mdev type, all of the devices within that type > > have some based compatibility, a phsyical device can only be migrated to > > the same physical device. In the latter case what defines the type? > > the type-id in /sysfs > > /sys/devices/virtual/mtty/mtty/ > |-- mdev_supported_types > | |-- mtty-1 <---- this is an mdev type > | | |-- available_instances > | | |-- create > | | |-- device_api > | | |-- devices > | | `-- name > | `-- mtty-2 <---- as is this > | |-- available_instances > | |-- create > | |-- device_api > | |-- devices > | `-- name > > |- [parent phy device] > |--- [$MDEV_UUID] > |--- remove > |--- mdev_type {link to its type} <-- here > |--- vendor-specific-attributes [optional] > > > If > > it's a PCI device, is it only vendor:device IDs? > > no the mdev type is not defined by the vendor:device id of the parent device > although the capablityes of that device will determin what mdev types if any it supprots. > > What about revision? > > What about subsystem IDs? > > at least for nvidia gpus i dont think if you by an evga branded v100 vs an pny branded one the capability > would change but i do know that certenly the capablities of a dell branding intel nic and an intel branded > one can. e.g. i have seen oem sku nics without sriov eventhoguh the same nic form intel supports it. > sriov was deliberatly disabled in the dell firmware even though it share dhte same vendor and prodcut id but differnt > subsystem id. > > if the odm made an incomatipable change like that which affect an mdev type in some way i guess i would expect them to > change the name or the description filed content to signal that. > > > What about possibly an onboard ROM or > > internal firmware? > > i would expect that updating the firmware/rom could result in changing a version string. that is how i was imagining > it would change. > > The information may be available, but which things > > are relevant to migration? > > that i dont know an i really would not like to encode that knolage in the vendor specific way in higher level > tools like openstack or even libvirt. declarative version sting comparisons or even simile feature flag > check where an abstract huristic that can be applied across vendors would be fine. but yes i dont know > what info would be needed in this case. > > We already see desires to allow migration > > between physical and mdev, > > migration between a phsical device and an mdev would not generally be considered a live migration in openstack. > that would be a different operation as it would be user visible withing the guest vm. > > but also to expose mdev types that might be > > composable to be compatible with other types. Thanks, > > i think composable mdev types are really challanging without some kind of feature flag concept > like cpu flags or ethtool nic capablities that are both human readable and easily parsable. > > we have the capability to schedule on cpu flags or gpu cuda level using a traits abstraction > so instead of saying i want an vm on a host with an intel 2695v3 to ensure it has AVX > you say i want an vm that is capable of using AVX > https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/__init__.py#L18 > > we also have trait for cuda level so instead of asking for a specifc mdev type or nvida > gpu the idea was you woudl describe what feature cuda in this exmple you need > https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda.py#L16-L45 > > That is what we call qualitative schudleing and is why we create teh placement service. > with out going in to the weeds we try to decouple quantaitive request such as 4 cpus and 1G of ram > form the qunative i need AVX supprot > > e.g. resouces:VCPU=4,resouces:MEMORY_MB=1024 triats:required=HW_CPU_X86_AVX > > declarative quantitive and capablites reporting of resouces fits easily into that model. > dynamic quantities that change as other mdev are allocated from the parent device or as > new mdevs types are composed on the fly are very challenging. > > > > > Alex > > > > From soulxu at gmail.com Wed Jul 15 07:23:42 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 15:23:42 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: Alex Williamson 于2020年7月15日周三 上午12:16写道: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps > upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to > check > > > if one device is able to migrate to another device before triggering a > real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please > don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > userspace tools read the migration_version as a string from the source > device, > > > and write it to the migration_version sysfs attribute in the target > device. > > > > > > The userspace should treat ANY of below conditions as two devices not > compatible: > > > - any one of the two devices does not have a migration_version > attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device > vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > If the "configured remote storage URL" is something configuration setting before the usage, then it isn't something we need for migration compatible check. Openstack only needs to know the target device's driver and hardware compatible for migration, then the scheduler will choose a host which such device, and then Openstack will pre-configure the target host and target device before the migration, then openstack will configure the correct remote storage URL to the device. If we want, we can do a sanity check after the live migration with the os. > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. > > > > (2) backgrounds > > > > > > The reason we hope the migration_version string is opaque to the > userspace > > > is that it is hard to generalize standard comparing fields and > comparing > > > methods for different devices from different vendors. > > > Though userspace now could still do a simple string compare to check if > > > two devices are compatible, and result should also be right, it's still > > > too limited as it excludes the possible candidate whose > migration_version > > > string fails to be equal. > > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably > compatible > > > with another MDEV with mdev_type_3, aggregator count 1, even their > > > migration_version strings are not equal. > > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > > > besides that, driver version + configured resources are all elements > demanding > > > to take into account. > > > > > > So, we hope leaving the freedom to vendor driver and let it make the > final decision > > > in a simple reading from source side and writing for test in the > target side way. > > > > > > > > > we then think the device compatibility issues for live migration with > assigned > > > devices can be divided into two steps: > > > a. management tools filter out possible migration target devices. > > > Tags could be created according to info from product specification. > > > we think openstack/ovirt may have vendor proprietary components to > create > > > those customized tags for each product from each vendor. > > > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags > to > > > search target vGPU are like: > > > a tag for compatible parent PCI IDs, > > > a tag for a range of gvt driver versions, > > > a tag for a range of mdev type + aggregator count > > > > > > for NVMe VF, the tags to search target VF may be like: > > > a tag for compatible PCI IDs, > > > a tag for a range of driver versions, > > > a tag for URL of configured remote storage. > > I interpret this as hand waving, ie. the first step is for management > tools to make a good guess :-\ We don't seem to be willing to say that > a given mdev type can only migrate to a device with that same type. > There's this aggregation discussion happening separately where a base > mdev type might be created or later configured to be equivalent to a > different type. The vfio migration API we've defined is also not > limited to mdev devices, for example we could create vendor specific > quirks or hooks to provide migration support for a physical PF/VF > device. Within the realm of possibility then is that we could migrate > between a physical device and an mdev device, which are simply > different degrees of creating a virtualization layer in front of the > device. > > > Requiring management application developers to figure out this possible > > compatibility based on prod specs is really unrealistic. Product specs > > are typically as clear as mud, and with the suggestion we consider > > different rules for different types of devices, add up to a huge amount > > of complexity. This isn't something app developers should have to spend > > their time figuring out. > > Agreed. > > > The suggestion that we make use of vendor proprietary helper components > > is totally unacceptable. We need to be able to build a solution that > > works with exclusively an open source software stack. > > I'm surprised to see this as well, but I'm not sure if Yan was really > suggesting proprietary software so much as just vendor specific > knowledge. > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. For the configuration options, there are two kinds of configuration options are needn't for the migration check. * The configuration option makes the device different, for example(could be wrong example, not matching any real hardware), A GPU supports 1024* 768 resolution and 800 * 600 resolution VGPUs, the OpenStack will separate this two kinds of VGPUs into two separate resource pool. so the scheduler already ensures we get a host with such vGPU support. so it needn't encode into the 'version string' discussed here. * The configuration option is setting before usage, just like the 'configured remote storage URL' above, it needn't encoded into the 'version string' also. Since the openstack will configure the correct value before the migration. > Am I overestimating how vendors intend to use the version string? > > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > Alex > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Wed Jul 15 07:37:19 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 15:37:19 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: Alex Williamson 于2020年7月15日周三 上午5:00写道: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of > the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to > check > > > > > if one device is able to migrate to another device before > triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices > not compatible: > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver > version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > The OpenStack Placement service doesn't support to filtering the target host by the semver syntax, altough we can code this filtering logic inside scheduler filtering by python code. Basically, placement only supports filtering the host by traits (it is same thing with labels, tags). The nova scheduler will call the placement service to filter the hosts first, then go through all the scheduler filters. That would be great if the placement service can filter out more hosts which isn't compatible first, and then it is better. > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > OpenStack already based on vendor and device id to separate the devices into the different resource pool, then the scheduler based on that to filer the hosts, so I think it needn't be the part of this compatibility string. > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > Since the placement support traits (labels, tags), so the placement just to matching those fields, so it isn't problem of openstack, since openstack needn't to know the meaning of those fields. But the traits is just a label, it isn't key-value format. But also if we have to, we can code this scheduler filter by python code. But the same thing as above, the invalid host can't be filtered out in the first step placement service filtering. > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, > > Alex > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgilbert at redhat.com Wed Jul 15 08:23:09 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Wed, 15 Jul 2020 09:23:09 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200715082309.GC2864@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, No, I'd mostly anticipated matching on the vendor and device and maybe a version number for the bit the user specifies; I had assumed all that 'vendor cruft' was still mostly opaque; having said that, if it did become a list of attributes like that (some of which were vendor specific) that would make sense to me. Dave > > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From yan.y.zhao at intel.com Wed Jul 15 08:20:41 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 15 Jul 2020 16:20:41 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200715082040.GA13136@joy-OptiPlex-7040> On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, > Yes, I agree kernel could expose such sysfs interface to educate openstack how to filter out devices. But I still think the proposed migration_version (or rename to migration_compatibility) interface is still required for libvirt to do double check. In the following scenario: 1. openstack chooses the target device by reading sysfs interface (of json format) of the source device. And Openstack are now pretty sure the two devices are migration compatible. 2. openstack asks libvirt to create the target VM with the target device and start live migration. 3. libvirt now receives the request. so it now has two choices: (1) create the target VM & target device and start live migration directly (2) double check if the target device is compatible with the source device before doing the remaining tasks. Because the factors to determine whether two devices are live migration compatible are complicated and may be dynamically changing, (e.g. driver upgrade or configuration changes), and also because libvirt should not totally rely on the input from openstack, I think the cost for libvirt is relatively lower if it chooses to go (2) than (1). At least it has no need to cancel migration and destroy the VM if it knows it earlier. So, it means the kernel may need to expose two parallel interfaces: (1) with json format, enumerating all possible fields and comparing methods, so as to indicate openstack how to find a matching target device (2) an opaque driver defined string, requiring write and test in target, which is used by libvirt to make sure device compatibility, rather than rely on the input accurateness from openstack or rely on kernel driver implementing the compatibility detection immediately after migration start. Does it make sense? Thanks Yan From shaohe.feng at intel.com Wed Jul 15 08:49:06 2020 From: shaohe.feng at intel.com (Feng, Shaohe) Date: Wed, 15 Jul 2020 08:49:06 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: <7B5303F69BB16B41BB853647B3E5BD70600BB667@SHSMSX104.ccr.corp.intel.com> -----Original Message----- From: Zhao, Yan Y Sent: 2020年7月15日 16:21 To: Alex Williamson Cc: Dr. David Alan Gilbert ; Daniel P. Berrangé ; devel at ovirt.org; openstack-discuss at lists.openstack.org; libvir-list at redhat.com; intel-gvt-dev at lists.freedesktop.org; kvm at vger.kernel.org; qemu-devel at nongnu.org; smooney at redhat.com; eskultet at redhat.com; cohuck at redhat.com; dinechin at redhat.com; corbet at lwn.net; kwankhede at nvidia.com; eauger at redhat.com; Ding, Jian-feng ; Xu, Hejie ; Tian, Kevin ; zhenyuw at linux.intel.com; bao.yumeng at zte.com.cn; Wang, Xin-ran ; Feng, Shaohe Subject: Re: device compatibility interface for live migration with assigned devices On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrangé > > > wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface > > > > > that helps upper layer stack like openstack/ovirt/libvirt to > > > > > check if two devices are live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last > > > > > step to check if one device is able to migrate to another > > > > > device before triggering a real live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > > > > > please don't hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under > > > > > each device's sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from > > > > > the source device, and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version > > > > > attribute > > > > > - error when reading from migration_version attribute of one > > > > > device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > > > > > device vendor driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like "parent > > > > > device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may > > > > > prefix a driver name to each migration_version string. e.g. > > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to > > > describe the contents of that opaque string. The point is that > > > its contents are defined by the vendor driver to describe the > > > device, driver version, and possibly metadata about the > > > configuration of the device. One instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only > > > string the vendor driver will accept, for example the driver might > > > support backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch > > series) > > > > My view is it makes sense to have a half-way house on the opaqueness > > of this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then > > a bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include > > the string and the user to be able to easily identify the device > > that was complaining and notice a difference in versions, and > > perhaps also use it in compatibility patterns to find compatible > > hosts; but that does get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only > determined between identical builds? Thanks, > Yes, I agree kernel could expose such sysfs interface to educate openstack how to filter out devices. But I still think the proposed migration_version (or rename to migration_compatibility) interface is still required for libvirt to do double check. In the following scenario: 1. openstack chooses the target device by reading sysfs interface (of json format) of the source device. And Openstack are now pretty sure the two devices are migration compatible. 2. openstack asks libvirt to create the target VM with the target device and start live migration. 3. libvirt now receives the request. so it now has two choices: (1) create the target VM & target device and start live migration directly (2) double check if the target device is compatible with the source device before doing the remaining tasks. Because the factors to determine whether two devices are live migration compatible are complicated and may be dynamically changing, (e.g. driver upgrade or configuration changes), and also because libvirt should not totally rely on the input from openstack, I think the cost for libvirt is relatively lower if it chooses to go (2) than (1). At least it has no need to cancel migration and destroy the VM if it knows it earlier. So, it means the kernel may need to expose two parallel interfaces: (1) with json format, enumerating all possible fields and comparing methods, so as to indicate openstack how to find a matching target device (2) an opaque driver defined string, requiring write and test in target, which is used by libvirt to make sure device compatibility, rather than rely on the input accurateness from openstack or rely on kernel driver implementing the compatibility detection immediately after migration start. Does it make sense? [Feng, Shaohe] Yes, had better 2 interface for different phase of live migration. For (1), it is can leverage these information for scheduler to minimize the failure rate of migration. The problem is that which value should be used for scheduler guide. The values should be human readable. For (2) yes we can't assume that the migration always screenful, double check is needed. BR Shaohe Thanks Yan From berrange at redhat.com Wed Jul 15 09:16:41 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Wed, 15 Jul 2020 10:16:41 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714144715.0ef70074@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714164722.GL25187@redhat.com> <20200714144715.0ef70074@x1.home> Message-ID: <20200715091641.GD68910@redhat.com> On Tue, Jul 14, 2020 at 02:47:15PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 17:47:22 +0100 > Daniel P. Berrangé wrote: > > I'm sure OpenStack maintainers can speak to this more, as they've put > > alot of work into their scheduling engine to optimize the way it places > > VMs largely driven from simple structured data reported from hosts. > > I think we've weeded out that our intended approach is not worthwhile, > testing a compatibility string at a device is too much overhead, we > need to provide enough information to the management engine to predict > the response without interaction beyond the initial capability probing. Just to clarify in case people mis-interpreted my POV... I think that testing a compatibility string at a device *is* useful, as it allows for a final accurate safety check to be performed before the migration stream starts. Libvirt could use that reasonably easily I believe. It just isn't sufficient for a complete solution. In parallel with the device level test in sysfs, we need something else to support the host placement selection problems in an efficient way, as you are trying to address in the remainder of your mail. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From soulxu at gmail.com Wed Jul 15 09:21:09 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 17:21:09 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: Yan Zhao 于2020年7月15日周三 下午4:32写道: > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid > of the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step > to check > > > > > > if one device is able to migrate to another device before > triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check > migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two > devices not compatible: > > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > > - error when reading from migration_version attribute of one > device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix > a driver name to > > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver > version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. > > > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > Yes, I agree kernel could expose such sysfs interface to educate > openstack how to filter out devices. But I still think the proposed > migration_version (or rename to migration_compatibility) interface is > still required for libvirt to do double check. > > In the following scenario: > 1. openstack chooses the target device by reading sysfs interface (of json > format) of the source device. And Openstack are now pretty sure the two > devices are migration compatible. > 2. openstack asks libvirt to create the target VM with the target device > and start live migration. > 3. libvirt now receives the request. so it now has two choices: > (1) create the target VM & target device and start live migration directly > (2) double check if the target device is compatible with the source > device before doing the remaining tasks. > > Because the factors to determine whether two devices are live migration > compatible are complicated and may be dynamically changing, (e.g. driver > upgrade or configuration changes), and also because libvirt should not > totally rely on the input from openstack, I think the cost for libvirt is > relatively lower if it chooses to go (2) than (1). At least it has no > need to cancel migration and destroy the VM if it knows it earlier. > If the driver upgrade or configuration changes, I guess there should be a restart of openstack agent on the host, that will update the info to the scheduler. so it should be fine. For (2), probably it need be used for double check when the orchestration layer doesn't implement the check logic in the scheduler. > > So, it means the kernel may need to expose two parallel interfaces: > (1) with json format, enumerating all possible fields and comparing > methods, so as to indicate openstack how to find a matching target device > (2) an opaque driver defined string, requiring write and test in target, > which is used by libvirt to make sure device compatibility, rather than > rely on the input accurateness from openstack or rely on kernel driver > implementing the compatibility detection immediately after migration > start. > > Does it make sense? > > Thanks > Yan > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at infomaniak.com Wed Jul 15 12:13:54 2020 From: zigo at infomaniak.com (Thomas Goirand) Date: Wed, 15 Jul 2020 14:13:54 +0200 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> Message-ID: <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Hi Ryan, If you don't mind, I'm adding the openstack-discuss list in the loop, as this topic may be of interest to others. For mailing list readers, I'm trying to implement this: https://review.opendev.org/#/c/669395/ but I'm having some difficulties. I did a bit of investigation with some added LOG.info() in the code. When doing: > openstack subnet create vm-fip \ > --subnet-range 10.66.20.0/24 \ > --service-type 'network:routed' \ > --service-type 'network:floatingip' \ > --network multisegment1 Here's where neutron-api crashes. in db/ipam_backend_mixin.py: def _validate_segment(self, context, network_id, segment_id, action=None, old_segment_id=None): # TODO(tidwellr) Create and use a constant for the service type segments = subnet_obj.Subnet.get_subnet_segment_ids( context, network_id, filtered_service_type='network:routed') associated_segments = set(segments) if None in associated_segments and len(associated_segments) > 1: raise segment_exc.SubnetsNotAllAssociatedWithSegments( network_id=network_id) SubnetsNotAllAssociatedWithSegments() is raised, as you must already guessed. Here's the values... associated_segments is an array containing 3 values: 2 being the IDs of the segments I added previously, the 3rd one being None. This test is then matched. Where is that None value coming from? Is this the new subnet I'm trying to add? Maybe the filtered_service_type='network:routed' in the call: subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? Printing the SQL query that is checked shows: SELECT subnets.segment_id AS subnets_segment_id FROM subnets WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id FROM subnet_service_types WHERE subnets.network_id = %(network_id_2)s AND subnet_service_types.subnet_id = subnets.id AND subnet_service_types.service_type = %(service_type_1)s) though when doing by hand: SELECT subnets.segment_id AS subnets_segment_id FROM subnets the db has only 2 subnets, so it looks like the floating-ip subnet got added before the check, and is then removed when the above test fails. So I just removed the raise, and could add the subnet I wanted, but that's obviously not a long term solution. Your thoughts? Another problem that I'm having, is that neutron-bgp-dragent is not receiving (or processing) the messages from neutron-rpc-server. I've enabled DEBUG mode for oslo_messaging, and found out that when dr-agent starts and prints "Agent has just been revived. Scheduling full sync", it does send a message to neutron-rpc-server, which is replied, but it doesn't look like dr-agent processes the return message in its reply queue, and then prints in the logs: "imeout in RPC method get_bgp_speakers. Waiting for 17 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is weird is that a few times (rarely), it worked, and the agent gets the reply. What should I do to investigate further? Cheers, Thomas Goirand (zigo) From jonathan at automatia.nl Wed Jul 15 12:34:13 2020 From: jonathan at automatia.nl (Jonathan de Jong) Date: Wed, 15 Jul 2020 14:34:13 +0200 Subject: [ideas] 3 new project drafts Message-ID: <4CEE40B4-74E6-49B1-9933-075C81D2A14C@getmailspring.com> Heya OpenStack community! The openstack-ideas website inspired me to create 3 more ideas, each based on some personal experiences and musings which OpenStack could address. Project "Dew": https://review.opendev.org/741008 (low-spec cloud computing) Project "Nebula": https://review.opendev.org/741057 (interface translation for plural or propriatary clouds) Project "Aurora": https://review.opendev.org/741165 (communal/collaborative cloud computing) I need to admit that these drafts are in my opnion extremely rough, very biased, and probably need to be rewritten several times. So that's why I invite people to discuss specifics and implementations of these ideas. If aspects of these ideas are similes of past proposals or projects, which have then been debunked/abandoned, i'm curious as to what discussion has happened for it to be rejected that way. If my language in these drafts are unreadable, confusing, simply too vague, or any other combination of sub-standard writing, please let me know. I plan to expand/improve/detail these project drafts from feedback and comments, please share if any of that comes to mind. Thanks in advance! - Jonathan de Jong From e0ne at e0ne.info Wed Jul 15 14:00:37 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Wed, 15 Jul 2020 17:00:37 +0300 Subject: [horizon] No meeting today Message-ID: Hi team, I can't attend the meeting today, so let's skip it. If you've got any topics to discuss today we can do it in #openstack-horizon IRC channel. Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Jul 15 14:09:53 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 15 Jul 2020 15:09:53 +0100 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: Hi Thomas: If I'm not wrong, the goal of this filtering is to remove all those subnets with service_type='network:routed'. Maybe you can check implementing an easier query: SELECT subnets.segment_id AS subnets_segment_id FROM subnets WHERE subnets.network_id = %(network_id_1)s AND NOT (EXISTS (SELECT * FROM subnet_service_types WHERE subnets.id = subnet_service_types.subnet_id AND subnet_service_types.service_type = %(service_type_1)s)) That will be translated to python as: query = test_db.context.session.query(subnet_obj.Subnet.db_model.segment_id) query = query.filter(subnet_obj.Subnet.db_model.network_id == network_id) if filtered_service_type: query = query.filter(~exists().where(and_( subnet_obj.Subnet.db_model.id == service_type_model.subnet_id, service_type_model.service_type == filtered_service_type))) Can you provide a UTs or a way to check the problem you are experiencing? Regards. On Wed, Jul 15, 2020 at 1:27 PM Thomas Goirand wrote: > Sending the message again with the correct From, as I'm not subscribed > to the list with the other mailbox. > > On 7/15/20 2:13 PM, Thomas Goirand wrote: > > Hi Ryan, > > > > If you don't mind, I'm adding the openstack-discuss list in the loop, as > > this topic may be of interest to others. > > > > For mailing list readers, I'm trying to implement this: > > https://review.opendev.org/#/c/669395/ > > but I'm having some difficulties. > > > > I did a bit of investigation with some added LOG.info() in the code. > > > > When doing: > > > >> openstack subnet create vm-fip \ > >> --subnet-range 10.66.20.0/24 \ > >> --service-type 'network:routed' \ > >> --service-type 'network:floatingip' \ > >> --network multisegment1 > > > > Here's where neutron-api crashes. in db/ipam_backend_mixin.py: > > > > def _validate_segment(self, context, network_id, segment_id, > > action=None, > > old_segment_id=None): > > # TODO(tidwellr) Create and use a constant for the service type > > segments = subnet_obj.Subnet.get_subnet_segment_ids( > > context, network_id, filtered_service_type='network:routed') > > > > associated_segments = set(segments) > > if None in associated_segments and len(associated_segments) > 1: > > raise segment_exc.SubnetsNotAllAssociatedWithSegments( > > network_id=network_id) > > > > SubnetsNotAllAssociatedWithSegments() is raised, as you must already > > guessed. Here's the values... > > > > associated_segments is an array containing 3 values: 2 being the IDs of > > the segments I added previously, the 3rd one being None. This test is > > then matched. Where is that None value coming from? Is this the new > > subnet I'm trying to add? Maybe the > > filtered_service_type='network:routed' in the call: > > subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? > > > > Printing the SQL query that is checked shows: > > > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN > > (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id > > FROM subnet_service_types > > WHERE subnets.network_id = %(network_id_2)s AND > > subnet_service_types.subnet_id = subnets.id AND > > subnet_service_types.service_type = %(service_type_1)s) > > > > though when doing by hand: > > > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > > > the db has only 2 subnets, so it looks like the floating-ip subnet got > > added before the check, and is then removed when the above test fails. > > > > So I just removed the raise, and could add the subnet I wanted, but > > that's obviously not a long term solution. > > > > Your thoughts? > > > > Another problem that I'm having, is that neutron-bgp-dragent is not > > receiving (or processing) the messages from neutron-rpc-server. I've > > enabled DEBUG mode for oslo_messaging, and found out that when dr-agent > > starts and prints "Agent has just been revived. Scheduling full sync", > > it does send a message to neutron-rpc-server, which is replied, but it > > doesn't look like dr-agent processes the return message in its reply > > queue, and then prints in the logs: "imeout in RPC method > > get_bgp_speakers. Waiting for 17 seconds before next attempt. If the > > server is not down, consider increasing the rpc_response_timeout option > > as Neutron server(s) may be overloaded and unable to respond quickly > > enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting > > for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is > > weird is that a few times (rarely), it worked, and the agent gets the > reply. > > > > What should I do to investigate further? > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From CAPSEY at augusta.edu Wed Jul 15 14:17:09 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Wed, 15 Jul 2020 14:17:09 +0000 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch Message-ID: All, A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu-system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It seemed like a heavy lift at the time, so it was put on the back burner. I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful blueprint that would potentially be accepted? Most of the time when people want to run an emulated guest they would just nest it inside of an already running guest of the native architecture, but that severely limits observability and the task of managing any more than a handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This would also be a significant differentiator for OpenStack as a whole. Thoughts? [1] http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html Chris Apsey Director | Georgia Cyber Range GEORGIA CYBER CENTER 100 Grace Hopper Lane | Augusta, Georgia | 30901 https://www.gacybercenter.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jul 15 14:36:33 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 15 Jul 2020 15:36:33 +0100 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch In-Reply-To: References: Message-ID: On Wed, 2020-07-15 at 14:17 +0000, Apsey, Christopher wrote: > All, > > A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, > would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that > compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but > wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu- > system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It > seemed like a heavy lift at the time, so it was put on the back burner. > > I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful > blueprint that would potentially be accepted? this came up during the ptg and the over all felling was it should really work already and if it does not its a bug. so yes i fa blueprint was filed to support emulation based on the image hw_architecture property i dont think you will get objection altough we proably will want to allso have schduler support for this and report it to placemnt or have a whigher of some kind to make it a compelte solution. i.e. enhance the virt driver to report all the achitecure it support via traits and add a weigher to prefer native execution over emulation. so placement can tell use where it can run and the weigher can say where it will run best. see line 467 https://etherpad.opendev.org/p/nova-victoria-ptg > Most of the time when people want to run an emulated guest they would just nest it inside of an already running > guest of the native architecture, but that severely limits observability and the task of managing any more than a > handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in > allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This > would also be a significant differentiator for OpenStack as a whole. > > Thoughts? > > [1] > http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html > > Chris Apsey > Director | Georgia Cyber Range > GEORGIA CYBER CENTER > > 100 Grace Hopper Lane | Augusta, Georgia | 30901 > https://www.gacybercenter.org > From nhicher at redhat.com Wed Jul 15 15:40:17 2020 From: nhicher at redhat.com (Nicolas Hicher) Date: Wed, 15 Jul 2020 11:40:17 -0400 Subject: [Tripleo] Planned outage of review.rdoproject.org: 2020-07-15 from 18:00 to 20:00 UTC Message-ID: Hello folks, Our cloud provider plans to do maintainance operation on 2020-07-15 from 18:00 to 20:00 UTC. Service interruption is expected, including: - Zuul CI not running jobs for gerrit, github or opendev. - RDO Trunk not building new packages. - DLRN API. - review.rdoproject.org and softwarefactory-project.io gerrit service. Regards, Nicolas, on behalf of the Software Factory Operation Team From hjensas at redhat.com Wed Jul 15 18:26:55 2020 From: hjensas at redhat.com (Harald Jensas) Date: Wed, 15 Jul 2020 20:26:55 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 absolutely! On Wed, 15 Jul 2020, 14:07 John Fulton, wrote: > +1 I thought he was already a core. > > On Wed, Jul 15, 2020 at 7:05 AM Cédric Jeanneret > wrote: > >> Of course +1! >> >> On 7/14/20 3:30 PM, Emilien Macchi wrote: >> > Hi folks, >> > >> > Rabi has proved deep technical understanding on the TripleO components >> > over the last years. >> > Initially as a major maintainer of the Heat project and then a regular >> > contributor to TripleO, he got involved at different levels: >> > - Optimization of the Heat templates, to reduce the number of resources >> > or improve them to make it faster and more efficient at scale. >> > - Migration of the Mistral workflows into native Ansible modules and >> > Python code into tripleo-common, with end-to-end expertise. >> > - Regular contributions to the container tooling integration. >> > >> > Being involved on the mailing-list and IRC channels, Rabi is always >> > helpful to the community and here to help. >> > He has provided thorough reviews in principal components on TripleO as >> > well as a lot of bug fixes or new features; which contributed to make >> > TripleO more stable and scalable. I would like to propose him be part of >> > the TripleO core team. >> > >> > Thanks Rabi for your hard work! >> > -- >> > Emilien Macchi >> >> -- >> Cédric Jeanneret (He/Him/His) >> Sr. Software Engineer - OpenStack Platform >> Deployment Framework TC >> Red Hat EMEA >> https://www.redhat.com/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Wed Jul 15 18:13:15 2020 From: thomas.king at gmail.com (Thomas King) Date: Wed, 15 Jul 2020 12:13:15 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Ruslanas, that would be excellent! I will reply to you directly for details later unless the maillist would like the full thread. Some preliminary questions: - Do you have a separate physical interface for the segment(s) used for your remote subnets? The docs state each segment must have a unique physical network name, which suggests a separate physical interface for each segment unless I'm misunderstanding something. - Are your provisioning segments all on the same Neutron network? - Are you using tagged switchports or access switchports to your Ironic server(s)? Thanks, Tom King On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis wrote: > I have deployed that with tripleO, but now we are recabling and > redeploying it. So once I have it running I can share my configs, just name > which you want :) > > On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: > >> I have. That's the Triple-O docs and they don't go through the normal >> .conf files to explain how it works outside of Triple-O. It has some ideas >> but no running configurations. >> >> Tom King >> >> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >> wrote: >> >>> hi, have you checked: >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>> ? >>> I am following this link. I only have one network, having different >>> issues tho ;) >>> >>> >>> >>> On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: >>> >>>> Thank you, Amy! >>>> >>>> Tom >>>> >>>> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >>>> >>>>> Hey Tom, >>>>> >>>>> Adding the OpenStack discuss list as I think you got several replies >>>>> from there as well. >>>>> >>>>> Thanks, >>>>> >>>>> Amy (spotz) >>>>> >>>>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>>>> wrote: >>>>> >>>>>> Good day, >>>>>> >>>>>> I'm bringing up a thread from June about DHCP relay with neutron >>>>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>>>> not have the plain config/neutron config to show how a regular Ironic setup >>>>>> would use DHCP relay. >>>>>> >>>>>> The Neutron segments docs state that I must have a unique physical >>>>>> network name. If my Ironic controller has a single provisioning network >>>>>> with a single physical network name, doesn't this prevent my use of >>>>>> multiple segments? >>>>>> >>>>>> Further, the segments docs state this: "The operator must ensure >>>>>> that every compute host that is supposed to participate in a router >>>>>> provider network has direct connectivity to one of its segments." (section >>>>>> 3 at >>>>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>>>> current docs state the same thing) >>>>>> This defeats the purpose of using DHCP relay, though, where the >>>>>> Ironic controller does *not* have direct connectivity to the remote >>>>>> segment. >>>>>> >>>>>> Here is a rough drawing - what is wrong with my thinking here? >>>>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP >>>>>> relay <------> Ironic controller, provisioning network: >>>>>> 10.146.29.192/26 VLAN 2115 >>>>>> >>>>>> Thank you, >>>>>> Tom King >>>>>> _______________________________________________ >>>>>> openstack-mentoring mailing list >>>>>> openstack-mentoring at lists.openstack.org >>>>>> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>>>> >>>>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> >> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Wed Jul 15 19:07:03 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 15 Jul 2020 22:07:03 +0300 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Hi Thomas, I have a bit complicated setup from tripleo side :) I use only one network (only ControlPlane). thanks to Harold, he helped to make it work for me. Yes, as written in the tripleo docs for leaf networks, it use the same neutron network, different subnets. so neutron network is ctlplane (I think) and have ctlplane-subnet, remote-provision and remote-KI :)) that generates additional lines in "ip r s" output for routing "foreign" subnets through correct gw, if you would have isolated networks, by vlans and ports this would apply for each subnet different gw... I believe you know/understand that part. remote* subnets have dhcp-relay setup by network team... do not ask details for that. I do not know how to, but can ask :) in undercloud/tripleo i have 2 dhcp servers, one is for introspection, another for provide/cleanup and deployment process. all of those subnets have organization level tagged networks and are tagged on network devices, but they are untagged on provisioning interfaces/ports, as in general pxe should be untagged, but some nic's can do vlan untag on nic/bios level. but who cares!? I just did a brief check on your first post, I think I have simmilar setup to yours :)) I will check in around 12hours :)) more deaply, as will be at work :))) P.S. sorry for wrong terms, I am bad at naming. On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: > Ruslanas, that would be excellent! > > I will reply to you directly for details later unless the maillist would > like the full thread. > > Some preliminary questions: > > - Do you have a separate physical interface for the segment(s) used > for your remote subnets? > The docs state each segment must have a unique physical network name, > which suggests a separate physical interface for each segment unless I'm > misunderstanding something. > - Are your provisioning segments all on the same Neutron network? > - Are you using tagged switchports or access switchports to your > Ironic server(s)? > > Thanks, > Tom King > > On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis > wrote: > >> I have deployed that with tripleO, but now we are recabling and >> redeploying it. So once I have it running I can share my configs, just name >> which you want :) >> >> On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: >> >>> I have. That's the Triple-O docs and they don't go through the normal >>> .conf files to explain how it works outside of Triple-O. It has some ideas >>> but no running configurations. >>> >>> Tom King >>> >>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>> wrote: >>> >>>> hi, have you checked: >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>> ? >>>> I am following this link. I only have one network, having different >>>> issues tho ;) >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From beagles at redhat.com Wed Jul 15 20:16:49 2020 From: beagles at redhat.com (Brent Eagles) Date: Wed, 15 Jul 2020 17:46:49 -0230 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 definitely! On Tue, Jul 14, 2020 at 11:03 AM Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Wed Jul 15 21:33:35 2020 From: thomas.king at gmail.com (Thomas King) Date: Wed, 15 Jul 2020 15:33:35 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: That helps a lot, thank you! "I use only one network..." This bit seems to go completely against the Neutron segments documentation. When you have access, please let me know if Triple-O is using segments or some other method. I greatly appreciate this, this is a tremendous help. Tom King On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis wrote: > Hi Thomas, > > I have a bit complicated setup from tripleo side :) I use only one network > (only ControlPlane). thanks to Harold, he helped to make it work for me. > > Yes, as written in the tripleo docs for leaf networks, it use the same > neutron network, different subnets. so neutron network is ctlplane (I > think) and have ctlplane-subnet, remote-provision and remote-KI :)) that > generates additional lines in "ip r s" output for routing "foreign" subnets > through correct gw, if you would have isolated networks, by vlans and ports > this would apply for each subnet different gw... I believe you > know/understand that part. > > remote* subnets have dhcp-relay setup by network team... do not ask > details for that. I do not know how to, but can ask :) > > > in undercloud/tripleo i have 2 dhcp servers, one is for introspection, > another for provide/cleanup and deployment process. > > all of those subnets have organization level tagged networks and are > tagged on network devices, but they are untagged on provisioning > interfaces/ports, as in general pxe should be untagged, but some nic's can > do vlan untag on nic/bios level. but who cares!? > > I just did a brief check on your first post, I think I have simmilar setup > to yours :)) I will check in around 12hours :)) more deaply, as will be at > work :))) > > > P.S. sorry for wrong terms, I am bad at naming. > > > On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: > >> Ruslanas, that would be excellent! >> >> I will reply to you directly for details later unless the maillist would >> like the full thread. >> >> Some preliminary questions: >> >> - Do you have a separate physical interface for the segment(s) used >> for your remote subnets? >> The docs state each segment must have a unique physical network name, >> which suggests a separate physical interface for each segment unless I'm >> misunderstanding something. >> - Are your provisioning segments all on the same Neutron network? >> - Are you using tagged switchports or access switchports to your >> Ironic server(s)? >> >> Thanks, >> Tom King >> >> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis >> wrote: >> >>> I have deployed that with tripleO, but now we are recabling and >>> redeploying it. So once I have it running I can share my configs, just name >>> which you want :) >>> >>> On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: >>> >>>> I have. That's the Triple-O docs and they don't go through the normal >>>> .conf files to explain how it works outside of Triple-O. It has some ideas >>>> but no running configurations. >>>> >>>> Tom King >>>> >>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> hi, have you checked: >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>> ? >>>>> I am following this link. I only have one network, having different >>>>> issues tho ;) >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Thu Jul 16 05:38:55 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 15 Jul 2020 22:38:55 -0700 Subject: [manila] No IRC meeting on 16th July 2020 In-Reply-To: References: Message-ID: Hello Zorillas and Interested Stackers, I clearly missed the overlap our IRC meeting had with the OpenStack Community Meeting (invite/information in the latter part of this email). This is an important one for all of us to attend if we can, and toast this amazing community, of which we are a small part. There were no new agenda items added for this week's meeting, so we'll push any discussion items to the next meeting. In the meantime, please note: - a kernel bug in Ubuntu 18.04 [1][2] currently causes test dsvm nodes on RAX to reboot when running the LVM job. We're currently skipping scenario tests in the LVM job to workaround the issue. This bug has been fixed in a new kernel version, we'll re-enable scenario tests when we don't see the issue occurring on RAX. - Manila's new driver deadline is the week of Jul 27 - Jul 31. Please interact with us on #openstack-manila should you have any concern with this deadline. - please review the specifications, they will need to be merged before the next week's meeting Hope to see you at the community meeting! Goutham Pacha Ravi [1] https://launchpad.net/bugs/1886988 [2] https://launchpad.net/bugs/1886668 ---------- Forwarded message --------- From: Sunny Cai Date: Wed, Jul 8, 2020 at 2:38 PM Subject: July OSF Community Meeting - 10 Years of OpenStack To: Hello everyone, You might have heard that OpenStack is turning 10 this year! On *Thursday*, *July 16 at 8am PT (1500 UTC)*, we will be holding the 10 years of OpenStack virtual celebration in the July OSF community meeting. I have attached the calendar invite for the July OSF community meeting below. Grab your favorite OpenStack swag and bring your favorite drinks of choice to the meeting on July 16. Let’s do a virtual toast to the 10 incredible years! Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV If you have any questions, please let me know. P.S. To add more fun, feel free to try out the virtual background feature in Zoom. The 10 years of OpenStack virtual background is attached below. Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 16 08:38:07 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 16 Jul 2020 10:38:07 +0200 Subject: [tripleo][centos8][ussuri][horizon] horizon container fails to start Message-ID: Hi all, I have noticed, that horizon container fails to start and some interestin zen_wozniak has apeared [0]. Healthcheck log is empty, but horizon log [1] sais "/usr/bin/python: No such file or directory" and there is no such file or directory :) after sume update it failed. I believe you guys will push update fast enough, as I am still bad at this git and container part.... HOW to fix it now :) on my side? As tripleo will redeploy horizon from images... and will update image. could you please give me a hint where to duck tape it whille it will be pushed to prod? [0] http://paste.openstack.org/show/3jjnsgXfWRxs3o0G6aKH/ [1] http://paste.openstack.org/show/1S66A55cz0UaFUWGxID8/ -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Thu Jul 16 12:56:51 2020 From: zigo at debian.org (Thomas Goirand) Date: Thu, 16 Jul 2020 14:56:51 +0200 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: <007d6225-12ef-69d7-6c76-45c093909297@debian.org> On 7/15/20 4:09 PM, Rodolfo Alonso Hernandez wrote: > Hi Thomas: > > If I'm not wrong, the goal of this filtering is to remove all those > subnets with service_type='network:routed'. Maybe you can check > implementing an easier query: > SELECT subnets.segment_id AS subnets_segment_id > FROM subnets > WHERE subnets.network_id = %(network_id_1)s AND NOT (EXISTS (SELECT * > FROM subnet_service_types > WHERE subnets.id = subnet_service_types.subnet_id > AND subnet_service_types.service_type = %(service_type_1)s)) > > That will be translated to python as: > > query = test_db.context.session.query(subnet_obj.Subnet.db_model.segment_id) > query = query.filter(subnet_obj.Subnet.db_model.network_id == network_id) > if filtered_service_type: > query = query.filter(~exists().where(and_( > subnet_obj.Subnet.db_model.id == service_type_model.subnet_id, > service_type_model.service_type == filtered_service_type))) > > Can you provide a UTs or a way to check the problem you are experiencing? > > Regards. Hi Rodolfo, Thanks for your help. I tried translating what you wrote above into a working code (ie: fixing a few variables here and there), which I sent as a new PR here: https://review.opendev.org/#/c/741429/ However, printing the result from SQLAlchemy shows that get_subnet_segment_ids() still returns None together with my other 2 subnets, so something must still be wrong. I'm not yet to the point I can write unit tests, just trying the code locally for the moment. Cheers, Thomas Goirand (zigo) From jasowang at redhat.com Thu Jul 16 04:16:26 2020 From: jasowang at redhat.com (Jason Wang) Date: Thu, 16 Jul 2020 12:16:26 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200713232957.GD5955@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> Message-ID: <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> On 2020/7/14 上午7:29, Yan Zhao wrote: > hi folks, > we are defining a device migration compatibility interface that helps upper > layer stack like openstack/ovirt/libvirt to check if two devices are > live migration compatible. > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > e.g. we could use it to check whether > - a src MDEV can migrate to a target MDEV, > - a src VF in SRIOV can migrate to a target VF in SRIOV, > - a src MDEV can migration to a target VF in SRIOV. > (e.g. SIOV/SRIOV backward compatibility case) > > The upper layer stack could use this interface as the last step to check > if one device is able to migrate to another device before triggering a real > live migration procedure. > we are not sure if this interface is of value or help to you. please don't > hesitate to drop your valuable comments. > > > (1) interface definition > The interface is defined in below way: > > __ userspace > /\ \ > / \write > / read \ > ________/__________ ___\|/_____________ > | migration_version | | migration_version |-->check migration > --------------------- --------------------- compatibility > device A device B > > > a device attribute named migration_version is defined under each device's > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). Are you aware of the devlink based device management interface that is proposed upstream? I think it has many advantages over sysfs, do you consider to switch to that? > userspace tools read the migration_version as a string from the source device, > and write it to the migration_version sysfs attribute in the target device. > > The userspace should treat ANY of below conditions as two devices not compatible: > - any one of the two devices does not have a migration_version attribute > - error when reading from migration_version attribute of one device > - error when writing migration_version string of one device to > migration_version attribute of the other device > > The string read from migration_version attribute is defined by device vendor > driver and is completely opaque to the userspace. My understanding is that something opaque to userspace is not the philosophy of Linux. Instead of having a generic API but opaque value, why not do in a vendor specific way like: 1) exposing the device capability in a vendor specific way via sysfs/devlink or other API 2) management read capability in both src and dst and determine whether we can do the migration This is the way we plan to do with vDPA. Thanks > for a Intel vGPU, string format can be defined like > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > for an NVMe VF connecting to a remote storage. it could be > "PCI ID" + "driver version" + "configured remote storage URL" > > for a QAT VF, it may be > "PCI ID" + "driver version" + "supported encryption set". > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > (2) backgrounds > > The reason we hope the migration_version string is opaque to the userspace > is that it is hard to generalize standard comparing fields and comparing > methods for different devices from different vendors. > Though userspace now could still do a simple string compare to check if > two devices are compatible, and result should also be right, it's still > too limited as it excludes the possible candidate whose migration_version > string fails to be equal. > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > with another MDEV with mdev_type_3, aggregator count 1, even their > migration_version strings are not equal. > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > besides that, driver version + configured resources are all elements demanding > to take into account. > > So, we hope leaving the freedom to vendor driver and let it make the final decision > in a simple reading from source side and writing for test in the target side way. > > > we then think the device compatibility issues for live migration with assigned > devices can be divided into two steps: > a. management tools filter out possible migration target devices. > Tags could be created according to info from product specification. > we think openstack/ovirt may have vendor proprietary components to create > those customized tags for each product from each vendor. > e.g. > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > search target vGPU are like: > a tag for compatible parent PCI IDs, > a tag for a range of gvt driver versions, > a tag for a range of mdev type + aggregator count > > for NVMe VF, the tags to search target VF may be like: > a tag for compatible PCI IDs, > a tag for a range of driver versions, > a tag for URL of configured remote storage. > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > device migration compatibility interface to make sure the two devices are > indeed live migration compatible before launching the real live migration > process to start stream copying, src device stopping and target device > resuming. > It is supposed that this step would not bring any performance penalty as > -in kernel it's just a simple string decoding and comparing > -in openstack/ovirt, it could be done by extending current function > check_can_live_migrate_destination, along side claiming target resources.[1] > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > Thanks > Yan > From yan.y.zhao at intel.com Thu Jul 16 08:32:30 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 16 Jul 2020 16:32:30 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> Message-ID: <20200716083230.GA25316@joy-OptiPlex-7040> On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > - a src MDEV can migration to a target VF in SRIOV. > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > Are you aware of the devlink based device management interface that is > proposed upstream? I think it has many advantages over sysfs, do you > consider to switch to that? not familiar with the devlink. will do some research of it. > > > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. > > > My understanding is that something opaque to userspace is not the philosophy but the VFIO live migration in itself is essentially a big opaque stream to userspace. > of Linux. Instead of having a generic API but opaque value, why not do in a > vendor specific way like: > > 1) exposing the device capability in a vendor specific way via sysfs/devlink > or other API > 2) management read capability in both src and dst and determine whether we > can do the migration > > This is the way we plan to do with vDPA. > yes, in another reply, Alex proposed to use an interface in json format. I guess we can define something like { "self" : [ { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_2", "aggregator" : "1", "pv-mode" : "none", } ], "compatible" : [ { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_2", "aggregator" : "1" "pv-mode" : "none", }, { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_4", "aggregator" : "2" "pv-mode" : "none", }, { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v2", "mdev_type" : "i915-GVTg_V5_4", "aggregator" : "2" "pv-mode" : "none, ppgtt, context", } ... ] } But as those fields are mostly vendor specific, the userspace can only do simple string comparing, I guess the list would be very long as it needs to enumerate all possible targets. also, in some fileds like "gvt-version", is there a simple way to express things like v2+? If the userspace can read this interface both in src and target and check whether both src and target are in corresponding compatible list, I think it will work for us. But still, kernel should not rely on userspace's choice, the opaque compatibility string is still required in kernel. No matter whether it would be exposed to userspace as an compatibility checking interface, vendor driver would keep this part of code and embed the string into the migration stream. so exposing it as an interface to be used by libvirt to do a safety check before a real live migration is only about enabling the kernel part of check to happen ahead. Thanks Yan > > > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > e.g. > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. > > > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > > device migration compatibility interface to make sure the two devices are > > indeed live migration compatible before launching the real live migration > > process to start stream copying, src device stopping and target device > > resuming. > > It is supposed that this step would not bring any performance penalty as > > -in kernel it's just a simple string decoding and comparing > > -in openstack/ovirt, it could be done by extending current function > > check_can_live_migrate_destination, along side claiming target resources.[1] > > > > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > > > Thanks > > Yan > > > From jasowang at redhat.com Thu Jul 16 09:30:41 2020 From: jasowang at redhat.com (Jason Wang) Date: Thu, 16 Jul 2020 17:30:41 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200716083230.GA25316@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> Message-ID: On 2020/7/16 下午4:32, Yan Zhao wrote: > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: >> On 2020/7/14 上午7:29, Yan Zhao wrote: >>> hi folks, >>> we are defining a device migration compatibility interface that helps upper >>> layer stack like openstack/ovirt/libvirt to check if two devices are >>> live migration compatible. >>> The "devices" here could be MDEVs, physical devices, or hybrid of the two. >>> e.g. we could use it to check whether >>> - a src MDEV can migrate to a target MDEV, >>> - a src VF in SRIOV can migrate to a target VF in SRIOV, >>> - a src MDEV can migration to a target VF in SRIOV. >>> (e.g. SIOV/SRIOV backward compatibility case) >>> >>> The upper layer stack could use this interface as the last step to check >>> if one device is able to migrate to another device before triggering a real >>> live migration procedure. >>> we are not sure if this interface is of value or help to you. please don't >>> hesitate to drop your valuable comments. >>> >>> >>> (1) interface definition >>> The interface is defined in below way: >>> >>> __ userspace >>> /\ \ >>> / \write >>> / read \ >>> ________/__________ ___\|/_____________ >>> | migration_version | | migration_version |-->check migration >>> --------------------- --------------------- compatibility >>> device A device B >>> >>> >>> a device attribute named migration_version is defined under each device's >>> sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). >> >> Are you aware of the devlink based device management interface that is >> proposed upstream? I think it has many advantages over sysfs, do you >> consider to switch to that? > not familiar with the devlink. will do some research of it. >> >>> userspace tools read the migration_version as a string from the source device, >>> and write it to the migration_version sysfs attribute in the target device. >>> >>> The userspace should treat ANY of below conditions as two devices not compatible: >>> - any one of the two devices does not have a migration_version attribute >>> - error when reading from migration_version attribute of one device >>> - error when writing migration_version string of one device to >>> migration_version attribute of the other device >>> >>> The string read from migration_version attribute is defined by device vendor >>> driver and is completely opaque to the userspace. >> >> My understanding is that something opaque to userspace is not the philosophy > but the VFIO live migration in itself is essentially a big opaque stream to userspace. I think it's better not limit to the kernel interface for a specific use case. This is basically the device introspection. > >> of Linux. Instead of having a generic API but opaque value, why not do in a >> vendor specific way like: >> >> 1) exposing the device capability in a vendor specific way via sysfs/devlink >> or other API >> 2) management read capability in both src and dst and determine whether we >> can do the migration >> >> This is the way we plan to do with vDPA. >> > yes, in another reply, Alex proposed to use an interface in json format. > I guess we can define something like > > { "self" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1", > "pv-mode" : "none", > } > ], > "compatible" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v2", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none, ppgtt, context", > } > ... > ] > } This is probably another call for devlink base interface. > > But as those fields are mostly vendor specific, the userspace can > only do simple string comparing, I guess the list would be very long as > it needs to enumerate all possible targets. > also, in some fileds like "gvt-version", is there a simple way to express > things like v2+? That's total vendor specific I think. If "v2+" means it only support a version 2+, we can introduce fields like min_version and max_version. But again, the point is to let such interfaces vendor specific instead of trying to have a generic format. > > If the userspace can read this interface both in src and target and > check whether both src and target are in corresponding compatible list, I > think it will work for us. > > But still, kernel should not rely on userspace's choice, the opaque > compatibility string is still required in kernel. No matter whether > it would be exposed to userspace as an compatibility checking interface, > vendor driver would keep this part of code and embed the string into the > migration stream. Why? Can we simply do: 1) Src support feature A, B, C  (version 1.0) 2) Dst support feature A, B, C, D (version 2.0) 3) only enable feature A, B, C in destination in a version specific way (set version to 1.0) 4) migrate metadata A, B, C > so exposing it as an interface to be used by libvirt to > do a safety check before a real live migration is only about enabling > the kernel part of check to happen ahead. If we've already exposed the capability, there's no need for an extra check like compatibility string. Thanks > > > Thanks > Yan > > >> >>> for a Intel vGPU, string format can be defined like >>> "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". >>> >>> for an NVMe VF connecting to a remote storage. it could be >>> "PCI ID" + "driver version" + "configured remote storage URL" >>> >>> for a QAT VF, it may be >>> "PCI ID" + "driver version" + "supported encryption set". >>> >>> (to avoid namespace confliction from each vendor, we may prefix a driver name to >>> each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) >>> >>> >>> (2) backgrounds >>> >>> The reason we hope the migration_version string is opaque to the userspace >>> is that it is hard to generalize standard comparing fields and comparing >>> methods for different devices from different vendors. >>> Though userspace now could still do a simple string compare to check if >>> two devices are compatible, and result should also be right, it's still >>> too limited as it excludes the possible candidate whose migration_version >>> string fails to be equal. >>> e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible >>> with another MDEV with mdev_type_3, aggregator count 1, even their >>> migration_version strings are not equal. >>> (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). >>> >>> besides that, driver version + configured resources are all elements demanding >>> to take into account. >>> >>> So, we hope leaving the freedom to vendor driver and let it make the final decision >>> in a simple reading from source side and writing for test in the target side way. >>> >>> >>> we then think the device compatibility issues for live migration with assigned >>> devices can be divided into two steps: >>> a. management tools filter out possible migration target devices. >>> Tags could be created according to info from product specification. >>> we think openstack/ovirt may have vendor proprietary components to create >>> those customized tags for each product from each vendor. >>> e.g. >>> for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to >>> search target vGPU are like: >>> a tag for compatible parent PCI IDs, >>> a tag for a range of gvt driver versions, >>> a tag for a range of mdev type + aggregator count >>> >>> for NVMe VF, the tags to search target VF may be like: >>> a tag for compatible PCI IDs, >>> a tag for a range of driver versions, >>> a tag for URL of configured remote storage. >>> >>> b. with the output from step a, openstack/ovirt/libvirt could use our proposed >>> device migration compatibility interface to make sure the two devices are >>> indeed live migration compatible before launching the real live migration >>> process to start stream copying, src device stopping and target device >>> resuming. >>> It is supposed that this step would not bring any performance penalty as >>> -in kernel it's just a simple string decoding and comparing >>> -in openstack/ovirt, it could be done by extending current function >>> check_can_live_migrate_destination, along side claiming target resources.[1] >>> >>> >>> [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html >>> >>> Thanks >>> Yan >>> From arnaud.morin at gmail.com Thu Jul 16 13:31:27 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 16 Jul 2020 13:31:27 +0000 Subject: [largescale-sig] OpenStack DB Archiver Message-ID: <20200716133127.GA31915@sync> Hello large-scalers! TLDR: we opensource a tool to help reducing size of databases. See https://github.com/ovh/osarchiver/ Few months ago, we released a tool, name osarchiver, which we are using on our production environment (at OVH) to help reduce the size of our tables in mariadb (or mysql) In fact, some tables are well know to grow very quickly. We use it, for example, to clean the OpenStack mistral database from old tasks, actions and executions which are older than a year. Another use case could be to archive some data in another table (e.g. with _archived as suffix) if they are 6 months old, and delete this data after 1 year. The source code of this tool is available here: https://github.com/ovh/osarchiver/ We were wondering if some other users would be interested in using the tool, and maybe move it under the opendev governance? Feel free to contact us and/or answer this thread. Cheers, -- Arnaud, Pierre-Samuel and OVH team From sunny at openstack.org Thu Jul 16 14:03:38 2020 From: sunny at openstack.org (Sunny Cai) Date: Thu, 16 Jul 2020 07:03:38 -0700 Subject: 10 Years of OpenStack Celebration - 8:00am PST (1500 UTC) Message-ID: <44F2A352-8155-4558-AF36-7A1BBB71F7AB@openstack.org> Hello everyone, The 10 years of OpenStack virtual celebration is starting in one hour! Join us and many of the original Stackers who helped form the project back in 2010 to celebrate the past 10 years. The meeting starts today at 8:00am PST (1500 UTC). Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 16 15:09:47 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 16 Jul 2020 17:09:47 +0200 Subject: Kolla Klub Message-ID: Hi Folks, The Kolla Klub is on. Sorry for the late reminder. The meeting url has changed a bit because it cannot be hosted by Mark today. Meeting link: https://meet.google.com/bpx-ymco-cfy Kolla Klub in docs: https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw -yoctozepto From sunny at openstack.org Thu Jul 16 18:39:22 2020 From: sunny at openstack.org (Sunny Cai) Date: Thu, 16 Jul 2020 11:39:22 -0700 Subject: 10 years of OpenStack virtual celebration - recordings Message-ID: Hello everyone, We just had the 10 years of OpenStack virtual celebration with the community members around the world. It was a huge success and thanks to everyone who have joined the community meeting. If you have missed the meeting or what a replay, here you can find the meeting recording and the slide deck: 10 years of OpenStack celebration recording: https://www.youtube.com/watch?v=QYhK0219LIk&feature=youtu.be Slide deck: https://docs.google.com/presentation/d/1bPJYOGVDypcXiNaddoPY9o1Wh-thZeuL-y8PPN-ugtY/edit?usp=sharing Check out the 10 years of OpenStack blog here: https://www.openstack.org/blog/thank-you-to-the-last-decade-hello-to-the-next/ Here I have attached a few screenshots from the virtual celebration. Happy 10 years of OpenStack and take care! Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.18 AM.png Type: image/png Size: 3111776 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.25 AM.png Type: image/png Size: 3265961 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.32 AM.png Type: image/png Size: 2672013 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.47 AM.png Type: image/png Size: 3397441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.53 AM.png Type: image/png Size: 3111248 bytes Desc: not available URL: From anilj.mailing at gmail.com Thu Jul 16 21:41:08 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Thu, 16 Jul 2020 14:41:08 -0700 Subject: RabbitMQ consumer connection is refused when trying to read notifications Message-ID: Hi, I followed the video and the steps provided in this video link and the consumer connection is being refused. https://www.openstack.org/videos/summits/denver-2019/nova-versioned-notifications-the-result-of-a-3-year-journey /etc/nova/nova.conf file changes.. [notifications] notify_on_state_change=vm_state default_level=INFO notification_format=both [oslo_messaging_notifications] driver=messagingv2 transport_url=rabbit://guest:guest at 10.30.8.57:5672/ topics=notification retry=-1 The python consume code is as follows (followed the example provided in the video: transport = oslo_messaging.get_notification_transport( cfg.CONF, url='rabbit://guest:guest at 10.30.8.57:5672/') targets = [ oslo_messaging.Target(topic='versioned_notifications'), ] Am I missing any other configuration in any of the services in OpenStack? Let me know if you need any other info. /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Jul 16 21:45:32 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 16 Jul 2020 16:45:32 -0500 Subject: [TripleO]Documentation to list all options in yaml file and possible values In-Reply-To: References: Message-ID: <0c59330a-8b7e-694e-4918-340ea4031db7@nemebean.com> /me looks sadly at the unfinished https://specs.openstack.org/openstack/tripleo-specs/specs/pike/environment-generator.html One of the goals of that spec was to provide something like this. AFAIK it was never completed (I certainly didn't, because...reasons). There are a few environments using it, but not most and as a result the goal to have every parameter for every service documented was never realized. Disclaimer: I haven't worked on TripleO in years, so it's possible something else has happened since then to address this. On 7/9/20 11:50 AM, Ruslanas Gžibovskis wrote: > Hi all, > > 1) Is there a page or a draft, where all options of TripleO are available? > 2) Is there a page or a draft, where dependencies of each option are listed? > 3) Is there a page or a draft, where all possible values for each option > would be listed? > > -- > Ruslanas Gžibovskis > +370 6030 7030 From pierre-samuel.le-stang at corp.ovh.com Fri Jul 17 08:53:27 2020 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Fri, 17 Jul 2020 10:53:27 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> References: <20200716133127.GA31915@sync> <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> Message-ID: <20200717085327.huq7ztefn7gkec5x@corp.ovh.com> Thomas Goirand wrote on ven. [2020-juil.-17 09:47:22 +0200]: > On 7/16/20 3:31 PM, Arnaud Morin wrote: > > Hello large-scalers! > > > > TLDR: we opensource a tool to help reducing size of databases. > > See https://github.com/ovh/osarchiver/ > > > > > > Few months ago, we released a tool, name osarchiver, which we are using > > on our production environment (at OVH) to help reduce the size of our > > tables in mariadb (or mysql) > > > > In fact, some tables are well know to grow very quickly. > > > > We use it, for example, to clean the OpenStack mistral database from old > > tasks, actions and executions which are older than a year. > > > > Another use case could be to archive some data in another table (e.g. with > > _archived as suffix) if they are 6 months old, and delete this data after > > 1 year. > > > > The source code of this tool is available here: > > https://github.com/ovh/osarchiver/ > > > > We were wondering if some other users would be interested in using the > > tool, and maybe move it under the opendev governance? > > > > Feel free to contact us and/or answer this thread. > > > > Cheers, > > Hi, > > That's very nice, thanks a lot for releasing such a thing. > > However, there's room for improvement if you would like to see your tool > shipped everywhere: > > - please define a requirements.txt > - please get the debian folder away from the main master branch, > especially considering it's using dh_virtualenv !!! > - please tag with a release number > > Also, with what release of OpenStack has this been tested? Is this bound > to a specific release? > > Cheers, > > Thomas Goirand (zigo) Hi Thomas, Thanks for your answer. We will update the repository accordingly. We tested OSArchiver on Newton and Stein releases of OpenStack. By design the tool is agnostic and rely only on the presence of the 'deleted_at' column so for now we do not expect to be bound to a specific release. Best regards, -- PS From thierry at openstack.org Fri Jul 17 13:11:01 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jul 2020 15:11:01 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: Arnaud Morin wrote: > [...] > The source code of this tool is available here: > https://github.com/ovh/osarchiver/ Thanks for sharing this tool! > We were wondering if some other users would be interested in using the > tool, and maybe move it under the opendev governance? I think this is one of those small operational tools that everyone ends up reinventing in their corner, duplicating effort. I support the idea of pushing it upstream, as it will make it easier for others to improve it, but also make the tool more discoverable. In terms of governance, we have several paths we could follow: 1/ we could create a specific project team to maintain this. That sounds completely overkill given the scope and size of the tool, and the fact that it's mostly feature-complete. Project teams are great to produce new "openstack" service components, but this is more peripheral operational tooling that would be released independently. 2/ we could adopt it at the Large Scale SIG, and promote it from there. I feel like this is useful beyond Large scale deployments though, so that sounds suboptimal 3/ during the last Opendev event we discussed reviving the OSops[1] idea: a lightweight area where operators can share the various small tools that they end up creating to help them operate OpenStack deployments. The effort has been dormant for a few years. I personally think the last option is the best, even if we need to figure a few things out before we can land this. I'll start a separate thread on OSops, and depending on how that goes, we'll choose between option 3 or option 2 for osarchiver. [1] https://wiki.openstack.org/wiki/Osops -- Thierry Carrez (ttx) From thierry at openstack.org Fri Jul 17 13:19:55 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jul 2020 15:19:55 +0200 Subject: [ops] Reviving OSOps ? In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> Hi everyone, During the last Opendev event we discussed reviving the OSops[1] idea: a lightweight area where operators can share the various small tools that they end up creating to help them operate OpenStack deployments. The effort has been mostly dormant for a few years. We had a recent thread[2] about osarchiver, a new operators helper, and whether it would make sense to push it upstream. I think the best option would be to revive OSops and land it there. Who is interested in helping to revive/maintain this ? If we revive it, I think we should move its repositories away from the catch-all "x" directory under opendev, which was created for projects that were not claimed by anyone during the big migration. If Osops should be considered distinct from OpenStack, then I'd recommend giving it its own opendev top directory, and move existing x/osops-* repositories to osops/*. If we'd like to make OSops a product of the OpenStack community (and have contributions to it be fully recognized as contributions to "OpenStack"), then I'd recommend creating a specific SIG dedicated to this, and move the x/osops-* repositories to openstack/osops-*. [1] https://wiki.openstack.org/wiki/Osops [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015977.html -- Thierry Carrez (ttx) From ignaziocassano at gmail.com Fri Jul 17 16:55:01 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 18:55:01 +0200 Subject: [openstack][octavia] transparent Message-ID: Hello all, I have some end users who want to receive on their load balanced web servers the client ip address for acl. They also want the https connection is terminated on web servers and not on load balancer. Can I solve with octavia ? I read haproxy can act as transparent only when it is the default router of backends. In our use case the default router is not the load balancer. Any help, please? Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Fri Jul 17 17:17:46 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 17 Jul 2020 10:17:46 -0700 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Hi Ignazio, Currently the amphora driver does not support passing the client source IP directly to the backend member server. However there are a few ways to accomplish this using the amphora driver: 1. Use the proxy protocol for the pool. 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For header. To use the PROXY protocol you would set up the load balancer like this: 1. Create the load balancer. 2. Create the listener using HTTPS pass through, so either the "HTTPS" or "TCP" protocol. 3. Create the pool using the "PROXY" protocol option. 4. Add your members and health manager as you normally do. Then, on the web servers enable PROXY protocol. On apache this is via the mod_remoteip module and the RemoteIPProxyProtocol directive. See: https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol On nginx it is enabled with the "proxy_protocol" directive. See: https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ Pretty much every web server has support for it. Michael On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano wrote: > > Hello all, I have some end users who want to receive on their load balanced web servers the client ip address for acl. > They also want the https connection is terminated on web servers and not on load balancer. > Can I solve with octavia ? > I read haproxy can act as transparent only when it is the default router of backends. > In our use case the default router is not the load balancer. > Any help, please? > Ignazio > From fungi at yuggoth.org Fri Jul 17 17:32:20 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jul 2020 17:32:20 +0000 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> On 2020-07-17 18:55:01 +0200 (+0200), Ignazio Cassano wrote: > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. They also want > the https connection is terminated on web servers and not on load > balancer. Can I solve with octavia ? I read haproxy can act as > transparent only when it is the default router of backends. In our > use case the default router is not the load balancer. Any help, > please? You'll be hard pressed to find any network load balancer which can satisfy this combination of requirements without also requiring some cooperation from the gateway. The ways you typically get the client IP addresses to your servers are one of: 1. Use the load balancer as the default router for the servers so that it doesn't need to alter the IP addresses of the packets (layer 3 forwarding). 2. Terminate SSL/TLS on the load balancer so that it can insert X-Forwarded-For headers into the HTTP requests, and then optionally re-encrypt when sending along to the servers (layer 7 forwarding). 3. A "direct server return" configuration where the load balancer masquerades as the clients and only handles the inbound packets to the servers, while the outbound replies from the servers go directly to the Internet through their default gateway (asymmetric layer 3 forwarding with destination NAT). This is the only option which meets the list of requirements you posed and it's exceptionally messy to implement, since you can't rely on state tracking either on the load balancer or the default gateway (each of them only sees half of the connection). This can also thoroughly confuse your packet filtering depending on where in your network it's applied. A bit of quick searching doesn't turn up any available amphorae for Octavia which support DSR, but even if there were I expect you'd face challenges adapting Neutron and security groups to handle it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Jul 17 17:55:12 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 19:55:12 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Many thanks, Michael. Ignazio Il Ven 17 Lug 2020, 19:17 Michael Johnson ha scritto: > Hi Ignazio, > > Currently the amphora driver does not support passing the client > source IP directly to the backend member server. > > However there are a few ways to accomplish this using the amphora driver: > 1. Use the proxy protocol for the pool. > 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For > header. > > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. > > Michael > > On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano > wrote: > > > > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. > > They also want the https connection is terminated on web servers and not > on load balancer. > > Can I solve with octavia ? > > I read haproxy can act as transparent only when it is the default router > of backends. > > In our use case the default router is not the load balancer. > > Any help, please? > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jul 17 18:20:10 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:20:10 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Hello Michael, I forgot to ask if the configuration you suggested can support acl for clients ip address. Ignazio Il Ven 17 Lug 2020, 19:17 Michael Johnson ha scritto: > Hi Ignazio, > > Currently the amphora driver does not support passing the client > source IP directly to the backend member server. > > However there are a few ways to accomplish this using the amphora driver: > 1. Use the proxy protocol for the pool. > 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For > header. > > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. > > Michael > > On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano > wrote: > > > > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. > > They also want the https connection is terminated on web servers and not > on load balancer. > > Can I solve with octavia ? > > I read haproxy can act as transparent only when it is the default router > of backends. > > In our use case the default router is not the load balancer. > > Any help, please? > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jul 17 18:22:06 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:22:06 +0200 Subject: [openstack][octavia] transparent In-Reply-To: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> References: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> Message-ID: Many thanks, Jeremy Il Ven 17 Lug 2020, 19:42 Jeremy Stanley ha scritto: > On 2020-07-17 18:55:01 +0200 (+0200), Ignazio Cassano wrote: > > Hello all, I have some end users who want to receive on their load > > balanced web servers the client ip address for acl. They also want > > the https connection is terminated on web servers and not on load > > balancer. Can I solve with octavia ? I read haproxy can act as > > transparent only when it is the default router of backends. In our > > use case the default router is not the load balancer. Any help, > > please? > > You'll be hard pressed to find any network load balancer which can > satisfy this combination of requirements without also requiring some > cooperation from the gateway. The ways you typically get the client > IP addresses to your servers are one of: > > 1. Use the load balancer as the default router for the servers so > that it doesn't need to alter the IP addresses of the packets (layer > 3 forwarding). > > 2. Terminate SSL/TLS on the load balancer so that it can insert > X-Forwarded-For headers into the HTTP requests, and then optionally > re-encrypt when sending along to the servers (layer 7 forwarding). > > 3. A "direct server return" configuration where the load balancer > masquerades as the clients and only handles the inbound packets to > the servers, while the outbound replies from the servers go directly > to the Internet through their default gateway (asymmetric layer 3 > forwarding with destination NAT). This is the only option which > meets the list of requirements you posed and it's exceptionally > messy to implement, since you can't rely on state tracking either on > the load balancer or the default gateway (each of them only sees > half of the connection). This can also thoroughly confuse your > packet filtering depending on where in your network it's applied. > > A bit of quick searching doesn't turn up any available amphorae for > Octavia which support DSR, but even if there were I expect you'd > face challenges adapting Neutron and security groups to handle it. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jul 17 18:23:37 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jul 2020 18:23:37 +0000 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: <20200717182337.i7a2lwpatempbz2x@yuggoth.org> On 2020-07-17 17:17 +0000 (+0000), Michael Johnson write: [...] > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. [...] Neat! Somehow this is the first I've heard of it. An attempt at a formal specification seems to be published at http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt but I'm not finding any corresponding IETF RFC draft. I agree it looks like a viable solution to the question posed (so long as the LB and servers have support for this custom protocol/encapsulation). Way less problematic than DSR, just unfortunately handled as a de facto standard from what I can see, but looks like https://tools.ietf.org/id/draft-schwartz-tls-lb-00.html touches on ways to hopefully provide a more extensible solution in the future. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Jul 17 18:25:49 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:25:49 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: I mean acl on load balancer not on web servers..... Il Ven 17 Lug 2020, 20:20 Ignazio Cassano ha scritto: > Hello Michael, I forgot to ask if the configuration you suggested can > support acl for clients ip address. > Ignazio > > Il Ven 17 Lug 2020, 19:17 Michael Johnson ha > scritto: > >> Hi Ignazio, >> >> Currently the amphora driver does not support passing the client >> source IP directly to the backend member server. >> >> However there are a few ways to accomplish this using the amphora driver: >> 1. Use the proxy protocol for the pool. >> 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For >> header. >> >> To use the PROXY protocol you would set up the load balancer like this: >> 1. Create the load balancer. >> 2. Create the listener using HTTPS pass through, so either the "HTTPS" >> or "TCP" protocol. >> 3. Create the pool using the "PROXY" protocol option. >> 4. Add your members and health manager as you normally do. >> >> Then, on the web servers enable PROXY protocol. >> On apache this is via the mod_remoteip module and the >> RemoteIPProxyProtocol directive. See: >> >> https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol >> On nginx it is enabled with the "proxy_protocol" directive. See: >> >> https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ >> >> Pretty much every web server has support for it. >> >> Michael >> >> On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano >> wrote: >> > >> > Hello all, I have some end users who want to receive on their load >> balanced web servers the client ip address for acl. >> > They also want the https connection is terminated on web servers and >> not on load balancer. >> > Can I solve with octavia ? >> > I read haproxy can act as transparent only when it is the default >> router of backends. >> > In our use case the default router is not the load balancer. >> > Any help, please? >> > Ignazio >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at goirand.fr Fri Jul 17 07:47:22 2020 From: thomas at goirand.fr (Thomas Goirand) Date: Fri, 17 Jul 2020 09:47:22 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> On 7/16/20 3:31 PM, Arnaud Morin wrote: > Hello large-scalers! > > TLDR: we opensource a tool to help reducing size of databases. > See https://github.com/ovh/osarchiver/ > > > Few months ago, we released a tool, name osarchiver, which we are using > on our production environment (at OVH) to help reduce the size of our > tables in mariadb (or mysql) > > In fact, some tables are well know to grow very quickly. > > We use it, for example, to clean the OpenStack mistral database from old > tasks, actions and executions which are older than a year. > > Another use case could be to archive some data in another table (e.g. with > _archived as suffix) if they are 6 months old, and delete this data after > 1 year. > > The source code of this tool is available here: > https://github.com/ovh/osarchiver/ > > We were wondering if some other users would be interested in using the > tool, and maybe move it under the opendev governance? > > Feel free to contact us and/or answer this thread. > > Cheers, Hi, That's very nice, thanks a lot for releasing such a thing. However, there's room for improvement if you would like to see your tool shipped everywhere: - please define a requirements.txt - please get the debian folder away from the main master branch, especially considering it's using dh_virtualenv !!! - please tag with a release number Also, with what release of OpenStack has this been tested? Is this bound to a specific release? Cheers, Thomas Goirand (zigo) From alex.williamson at redhat.com Fri Jul 17 14:59:35 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 08:59:35 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: <20200717085935.224ffd46@x1.home> On Wed, 15 Jul 2020 16:20:41 +0800 Yan Zhao wrote: > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > - error when reading from migration_version attribute of one device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. > > > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > Yes, I agree kernel could expose such sysfs interface to educate > openstack how to filter out devices. But I still think the proposed > migration_version (or rename to migration_compatibility) interface is > still required for libvirt to do double check. > > In the following scenario: > 1. openstack chooses the target device by reading sysfs interface (of json > format) of the source device. And Openstack are now pretty sure the two > devices are migration compatible. > 2. openstack asks libvirt to create the target VM with the target device > and start live migration. > 3. libvirt now receives the request. so it now has two choices: > (1) create the target VM & target device and start live migration directly > (2) double check if the target device is compatible with the source > device before doing the remaining tasks. > > Because the factors to determine whether two devices are live migration > compatible are complicated and may be dynamically changing, (e.g. driver > upgrade or configuration changes), and also because libvirt should not > totally rely on the input from openstack, I think the cost for libvirt is > relatively lower if it chooses to go (2) than (1). At least it has no > need to cancel migration and destroy the VM if it knows it earlier. > > So, it means the kernel may need to expose two parallel interfaces: > (1) with json format, enumerating all possible fields and comparing > methods, so as to indicate openstack how to find a matching target device > (2) an opaque driver defined string, requiring write and test in target, > which is used by libvirt to make sure device compatibility, rather than > rely on the input accurateness from openstack or rely on kernel driver > implementing the compatibility detection immediately after migration > start. > > Does it make sense? No, libvirt is not responsible for the success or failure of the migration, it's the vendor driver's responsibility to encode compatibility information early in the migration stream and error should the incoming device prove to be incompatible. It's not libvirt's job to second guess the management engine and I would not support a duplicate interface only for that purpose. Thanks, Alex From alex.williamson at redhat.com Fri Jul 17 15:18:54 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 09:18:54 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200717091854.72013c91@x1.home> On Wed, 15 Jul 2020 15:37:19 +0800 Alex Xu wrote: > Alex Williamson 于2020年7月15日周三 上午5:00写道: > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that > > helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > > are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of > > the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step to > > check > > > > > > if one device is able to migrate to another device before > > triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. > > please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > > device's > > > > > > sysfs node. e.g. > > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the > > source device, > > > > > > and write it to the migration_version sysfs attribute in the > > target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices > > not compatible: > > > > > > - any one of the two devices does not have a migration_version > > attribute > > > > > > - error when reading from migration_version attribute of one device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by > > device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > > "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > > driver name to > > > > > > each migration_version string. e.g. > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver > > version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > > > The OpenStack Placement service doesn't support to filtering the target > host by the semver syntax, altough we can code this filtering logic inside > scheduler filtering by python code. Basically, placement only supports > filtering the host by traits (it is same thing with labels, tags). The nova > scheduler will call the placement service to filter the hosts first, then > go through all the scheduler filters. That would be great if the placement > service can filter out more hosts which isn't compatible first, and then it > is better. > > > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > > > OpenStack already based on vendor and device id to separate the devices > into the different resource pool, then the scheduler based on that to filer > the hosts, so I think it needn't be the part of this compatibility string. This is the part of the string that actually says what the resulting device is, so it's a rather fundamental part of the description. This is where we'd determine that a physical to mdev migration is possible or that different mdev types result in the same guest PCI device, possibly with attributes set as specified later in the output. > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. That's what I'm defining in the below vendor_fields, the above mdev_attrs would be specifying attributes of the device that must be set in order to create the device with the compatibility described. For example if we're describing compatibility for type foo-1, which is a base type that can be equivalent to type foo-3 if type foo-1 is created with aggregation=3, this is where that would be defined. Thanks, Alex > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > Since the placement support traits (labels, tags), so the placement just to > matching those fields, so it isn't problem of openstack, since openstack > needn't to know the meaning of those fields. But the traits is just a > label, it isn't key-value format. But also if we have to, we can code this > scheduler filter by python code. But the same thing as above, the invalid > host can't be filtered out in the first step placement service filtering. > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > > Alex > > > > From alex.williamson at redhat.com Fri Jul 17 16:12:58 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 10:12:58 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200716083230.GA25316@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> Message-ID: <20200717101258.65555978@x1.home> On Thu, 16 Jul 2020 16:32:30 +0800 Yan Zhao wrote: > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > Are you aware of the devlink based device management interface that is > > proposed upstream? I think it has many advantages over sysfs, do you > > consider to switch to that? Advantages, such as? > not familiar with the devlink. will do some research of it. > > > > > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > > > > My understanding is that something opaque to userspace is not the philosophy > > but the VFIO live migration in itself is essentially a big opaque stream to userspace. > > > of Linux. Instead of having a generic API but opaque value, why not do in a > > vendor specific way like: > > > > 1) exposing the device capability in a vendor specific way via sysfs/devlink > > or other API > > 2) management read capability in both src and dst and determine whether we > > can do the migration > > > > This is the way we plan to do with vDPA. > > > yes, in another reply, Alex proposed to use an interface in json format. > I guess we can define something like > > { "self" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1", > "pv-mode" : "none", > } > ], > "compatible" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v2", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none, ppgtt, context", > } > ... > ] > } > > But as those fields are mostly vendor specific, the userspace can > only do simple string comparing, I guess the list would be very long as > it needs to enumerate all possible targets. This ignores so much of what I tried to achieve in my example :( > also, in some fileds like "gvt-version", is there a simple way to express > things like v2+? That's not a reasonable thing to express anyway, how can you be certain that v3 won't break compatibility with v2? Sean proposed a versioning scheme that accounts for this, using an x.y.z version expressing the major, minor, and bugfix versions, where there is no compatibility across major versions, minor versions have forward compatibility (ex. 1 -> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some degree of internal improvement that is not visible to the user in terms of features or compatibility, but provides a basis for preferring equally compatible candidates. > If the userspace can read this interface both in src and target and > check whether both src and target are in corresponding compatible list, I > think it will work for us. > > But still, kernel should not rely on userspace's choice, the opaque > compatibility string is still required in kernel. No matter whether > it would be exposed to userspace as an compatibility checking interface, > vendor driver would keep this part of code and embed the string into the > migration stream. so exposing it as an interface to be used by libvirt to > do a safety check before a real live migration is only about enabling > the kernel part of check to happen ahead. As you indicate, the vendor driver is responsible for checking version information embedded within the migration stream. Therefore a migration should fail early if the devices are incompatible. Is it really libvirt's place to second guess what it has been directed to do? Why would we even proceed to design a user parse-able version interface if we still have a dependency on an opaque interface? Thanks, Alex From dgilbert at redhat.com Fri Jul 17 18:03:44 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Fri, 17 Jul 2020 19:03:44 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717085935.224ffd46@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> <20200717085935.224ffd46@x1.home> Message-ID: <20200717180344.GD3294@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Wed, 15 Jul 2020 16:20:41 +0800 > Yan Zhao wrote: > > > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > > "Dr. David Alan Gilbert" wrote: > > > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > hi folks, > > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > > live migration compatible. > > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > > e.g. we could use it to check whether > > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > > live migration procedure. > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > > The interface is defined in below way: > > > > > > > > > > > > > > __ userspace > > > > > > > /\ \ > > > > > > > / \write > > > > > > > / read \ > > > > > > > ________/__________ ___\|/_____________ > > > > > > > | migration_version | | migration_version |-->check migration > > > > > > > --------------------- --------------------- compatibility > > > > > > > device A device B > > > > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > > - error when reading from migration_version attribute of one device > > > > > > > - error when writing migration_version string of one device to > > > > > > > migration_version attribute of the other device > > > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > > driver and is completely opaque to the userspace. > > > > > > > for a Intel vGPU, string format can be defined like > > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > > > for a QAT VF, it may be > > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > > the contents of that opaque string. The point is that its contents > > > > > are defined by the vendor driver to describe the device, driver version, > > > > > and possibly metadata about the configuration of the device. One > > > > > instance of a device might generate a different string from another. > > > > > The string that a device produces is not necessarily the only string > > > > > the vendor driver will accept, for example the driver might support > > > > > backwards compatible migrations. > > > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > > this string; I'd expect to have an ID and version that are human > > > > readable, maybe a device ID/name that's human interpretable and then a > > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > > > I'm thinking that we want to be able to report problems and include the > > > > string and the user to be able to easily identify the device that was > > > > complaining and notice a difference in versions, and perhaps also use > > > > it in compatibility patterns to find compatible hosts; but that does > > > > get tricky when it's a 'ask the device if it's compatible'. > > > > > > In the reply I just sent to Dan, I gave this example of what a > > > "compatibility string" might look like represented as json: > > > > > > { > > > "device_api": "vfio-pci", > > > "vendor": "vendor-driver-name", > > > "version": { > > > "major": 0, > > > "minor": 1 > > > }, > > > "vfio-pci": { // Based on above device_api > > > "vendor": 0x1234, // Values for the exposed device > > > "device": 0x5678, > > > // Possibly further parameters for a more specific match > > > }, > > > "mdev_attrs": [ > > > { "attribute0": "VALUE" } > > > ] > > > } > > > > > > Are you thinking that we might allow the vendor to include a vendor > > > specific array where we'd simply require that both sides have matching > > > fields and values? ie. > > > > > > "vendor_fields": [ > > > { "unknown_field0": "unknown_value0" }, > > > { "unknown_field1": "unknown_value1" }, > > > ] > > > > > > We could certainly make that part of the spec, but I can't really > > > figure the value of it other than to severely restrict compatibility, > > > which the vendor could already do via the version.major value. Maybe > > > they'd want to put a build timestamp, random uuid, or source sha1 into > > > such a field to make absolutely certain compatibility is only determined > > > between identical builds? Thanks, > > > > > Yes, I agree kernel could expose such sysfs interface to educate > > openstack how to filter out devices. But I still think the proposed > > migration_version (or rename to migration_compatibility) interface is > > still required for libvirt to do double check. > > > > In the following scenario: > > 1. openstack chooses the target device by reading sysfs interface (of json > > format) of the source device. And Openstack are now pretty sure the two > > devices are migration compatible. > > 2. openstack asks libvirt to create the target VM with the target device > > and start live migration. > > 3. libvirt now receives the request. so it now has two choices: > > (1) create the target VM & target device and start live migration directly > > (2) double check if the target device is compatible with the source > > device before doing the remaining tasks. > > > > Because the factors to determine whether two devices are live migration > > compatible are complicated and may be dynamically changing, (e.g. driver > > upgrade or configuration changes), and also because libvirt should not > > totally rely on the input from openstack, I think the cost for libvirt is > > relatively lower if it chooses to go (2) than (1). At least it has no > > need to cancel migration and destroy the VM if it knows it earlier. > > > > So, it means the kernel may need to expose two parallel interfaces: > > (1) with json format, enumerating all possible fields and comparing > > methods, so as to indicate openstack how to find a matching target device > > (2) an opaque driver defined string, requiring write and test in target, > > which is used by libvirt to make sure device compatibility, rather than > > rely on the input accurateness from openstack or rely on kernel driver > > implementing the compatibility detection immediately after migration > > start. > > > > Does it make sense? > > No, libvirt is not responsible for the success or failure of the > migration, it's the vendor driver's responsibility to encode > compatibility information early in the migration stream and error > should the incoming device prove to be incompatible. It's not > libvirt's job to second guess the management engine and I would not > support a duplicate interface only for that purpose. Thanks, libvirt does try to enforce it for other things; trying to stop a bad migration from starting. Dave > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From alex.williamson at redhat.com Fri Jul 17 18:30:26 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 12:30:26 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717180344.GD3294@work-vm> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> <20200717085935.224ffd46@x1.home> <20200717180344.GD3294@work-vm> Message-ID: <20200717123026.6ab26442@x1.home> On Fri, 17 Jul 2020 19:03:44 +0100 "Dr. David Alan Gilbert" wrote: > * Alex Williamson (alex.williamson at redhat.com) wrote: > > On Wed, 15 Jul 2020 16:20:41 +0800 > > Yan Zhao wrote: > > > > > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > > hi folks, > > > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > > > live migration compatible. > > > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > > > e.g. we could use it to check whether > > > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > > > live migration procedure. > > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > > > The interface is defined in below way: > > > > > > > > > > > > > > > > __ userspace > > > > > > > > /\ \ > > > > > > > > / \write > > > > > > > > / read \ > > > > > > > > ________/__________ ___\|/_____________ > > > > > > > > | migration_version | | migration_version |-->check migration > > > > > > > > --------------------- --------------------- compatibility > > > > > > > > device A device B > > > > > > > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > > > - error when reading from migration_version attribute of one device > > > > > > > > - error when writing migration_version string of one device to > > > > > > > > migration_version attribute of the other device > > > > > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > > > driver and is completely opaque to the userspace. > > > > > > > > for a Intel vGPU, string format can be defined like > > > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > > > > > for a QAT VF, it may be > > > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > > > the contents of that opaque string. The point is that its contents > > > > > > are defined by the vendor driver to describe the device, driver version, > > > > > > and possibly metadata about the configuration of the device. One > > > > > > instance of a device might generate a different string from another. > > > > > > The string that a device produces is not necessarily the only string > > > > > > the vendor driver will accept, for example the driver might support > > > > > > backwards compatible migrations. > > > > > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > > > this string; I'd expect to have an ID and version that are human > > > > > readable, maybe a device ID/name that's human interpretable and then a > > > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > > > > > I'm thinking that we want to be able to report problems and include the > > > > > string and the user to be able to easily identify the device that was > > > > > complaining and notice a difference in versions, and perhaps also use > > > > > it in compatibility patterns to find compatible hosts; but that does > > > > > get tricky when it's a 'ask the device if it's compatible'. > > > > > > > > In the reply I just sent to Dan, I gave this example of what a > > > > "compatibility string" might look like represented as json: > > > > > > > > { > > > > "device_api": "vfio-pci", > > > > "vendor": "vendor-driver-name", > > > > "version": { > > > > "major": 0, > > > > "minor": 1 > > > > }, > > > > "vfio-pci": { // Based on above device_api > > > > "vendor": 0x1234, // Values for the exposed device > > > > "device": 0x5678, > > > > // Possibly further parameters for a more specific match > > > > }, > > > > "mdev_attrs": [ > > > > { "attribute0": "VALUE" } > > > > ] > > > > } > > > > > > > > Are you thinking that we might allow the vendor to include a vendor > > > > specific array where we'd simply require that both sides have matching > > > > fields and values? ie. > > > > > > > > "vendor_fields": [ > > > > { "unknown_field0": "unknown_value0" }, > > > > { "unknown_field1": "unknown_value1" }, > > > > ] > > > > > > > > We could certainly make that part of the spec, but I can't really > > > > figure the value of it other than to severely restrict compatibility, > > > > which the vendor could already do via the version.major value. Maybe > > > > they'd want to put a build timestamp, random uuid, or source sha1 into > > > > such a field to make absolutely certain compatibility is only determined > > > > between identical builds? Thanks, > > > > > > > Yes, I agree kernel could expose such sysfs interface to educate > > > openstack how to filter out devices. But I still think the proposed > > > migration_version (or rename to migration_compatibility) interface is > > > still required for libvirt to do double check. > > > > > > In the following scenario: > > > 1. openstack chooses the target device by reading sysfs interface (of json > > > format) of the source device. And Openstack are now pretty sure the two > > > devices are migration compatible. > > > 2. openstack asks libvirt to create the target VM with the target device > > > and start live migration. > > > 3. libvirt now receives the request. so it now has two choices: > > > (1) create the target VM & target device and start live migration directly > > > (2) double check if the target device is compatible with the source > > > device before doing the remaining tasks. > > > > > > Because the factors to determine whether two devices are live migration > > > compatible are complicated and may be dynamically changing, (e.g. driver > > > upgrade or configuration changes), and also because libvirt should not > > > totally rely on the input from openstack, I think the cost for libvirt is > > > relatively lower if it chooses to go (2) than (1). At least it has no > > > need to cancel migration and destroy the VM if it knows it earlier. > > > > > > So, it means the kernel may need to expose two parallel interfaces: > > > (1) with json format, enumerating all possible fields and comparing > > > methods, so as to indicate openstack how to find a matching target device > > > (2) an opaque driver defined string, requiring write and test in target, > > > which is used by libvirt to make sure device compatibility, rather than > > > rely on the input accurateness from openstack or rely on kernel driver > > > implementing the compatibility detection immediately after migration > > > start. > > > > > > Does it make sense? > > > > No, libvirt is not responsible for the success or failure of the > > migration, it's the vendor driver's responsibility to encode > > compatibility information early in the migration stream and error > > should the incoming device prove to be incompatible. It's not > > libvirt's job to second guess the management engine and I would not > > support a duplicate interface only for that purpose. Thanks, > > libvirt does try to enforce it for other things; trying to stop a bad > migration from starting. Even if libvirt did want to verify why would we want to support a separate opaque interface for that purpose versus a parse-able interface? If we get different results, we've failed. Thanks, Alex From johnsomor at gmail.com Fri Jul 17 19:12:20 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 17 Jul 2020 12:12:20 -0700 Subject: [octavia] Proposal to deprecate the amphora spares pool Message-ID: Back at the Victoria PTG the Octavia team discussed deprecating the spares pool capability of the amphora driver[1]. This would follow the standard OpenStack deprecation process[2]. There are a number of reasons this was proposed: 1. It adds a lot of complexity to the code. 2. It can't be used with Active/Standby load balancers due to server group (anti-affinity) limitations in Nova. 3. It provides only 15-30 seconds of speedup when provisioning a new load balancer on production clouds. 4. It makes supporting Octavia availability zones awkward as we have to boot spares instances in each AZ. 5. It can be confusing for people when it is enabled as there are always extra amphora running and being automatically recreated. Due to these reasons a patch has been proposed to deprecate spares pool support in the amphora driver: https://review.opendev.org/741686 Please comment on that patch and/or join the weekly Octavia IRC meeting if you have any concerns with this deprecation plan. Michael [1] https://etherpad.opendev.org/p/octavia-virtual-V-ptg [2] https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html From sean.mcginnis at gmx.com Fri Jul 17 22:43:33 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 17 Jul 2020 17:43:33 -0500 Subject: [PTL][Stable] Releases proposed for stable/stein Message-ID: <821d2cfa-1aeb-d532-0b56-db3918ab0215@gmx.com> /me takes off release team hat and puts on stable team hat Hey everyone, To help out with stable releases, I've run a script to propose releases for any deliverables in stable/stein that had commits merged but not released yet. This is just to try to help make sure those fixes get out downstream, and to help ease the crunch that we inevitably have near the time that stable/stein goes into Extended Maintenance mode (this coming November). These are not driven by the release team, and they are not required. They are merely a convenience to help out the teams. If there is a patch for any deliverables owned by your team and you are good with the release, please leave a +1 and we will process it. Any patches with a -1, or anything not acknowledged by the end of next week, will just be abandoned. Of course, stable releases can be proposed by the team whenever they are ready. Again, this is not a release team activity. This may or may not be done regularly. I just had some time and an itch to do it. Patches can be found here: https://review.opendev.org/#/q/topic:stein-stable+(status:open+OR+status:merged) Thanks! Sean From anilj.mailing at gmail.com Sat Jul 18 06:00:19 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 17 Jul 2020 23:00:19 -0700 Subject: RabbitMQ consumer connection is refused when trying to read notifications In-Reply-To: References: Message-ID: Hi, Can someone please provide some clue on this issue? /anil. On Thu, Jul 16, 2020 at 2:41 PM Anil Jangam wrote: > Hi, > > I followed the video and the steps provided in this video link and the > consumer connection is being refused. > > https://www.openstack.org/videos/summits/denver-2019/nova-versioned-notifications-the-result-of-a-3-year-journey > > /etc/nova/nova.conf file changes.. > [notifications] > notify_on_state_change=vm_state > default_level=INFO > notification_format=both > > [oslo_messaging_notifications] > driver=messagingv2 > transport_url=rabbit://guest:guest at 10.30.8.57:5672/ > topics=notification > retry=-1 > > The python consume code is as follows (followed the example provided in > the video: > transport = oslo_messaging.get_notification_transport( > cfg.CONF, url='rabbit://guest:guest at 10.30.8.57:5672/') > targets = [ > oslo_messaging.Target(topic='versioned_notifications'), > ] > > Am I missing any other configuration in any of the services in OpenStack? > > Let me know if you need any other info. > > /anil. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Sat Jul 18 06:13:41 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 17 Jul 2020 23:13:41 -0700 Subject: SDK API to get the version of openstack distribution Message-ID: Hi, I am able to iterate through the list of hypervisors and servers as follows. for hypervisor in self.connection.list_hypervisors(): for server in self.connection.compute.servers(): However, I could not find an API that returns the version of the OpenStack i.e. whether it is Stein, Train, or Ussuri. Openstack CLI client has command: openstack versions show Thanks, /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sat Jul 18 09:38:20 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 18 Jul 2020 11:38:20 +0200 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy Message-ID: Morning, Folks! It seems the devstack glance standalone mode (the new default) is broken at the moment if not using tls-proxy. If your jobs break on g-api not coming up, then this is the likely case. So far it seems to have hit Neutron and Nodepool jobs (and hence also SDK and DIB for example). Please refrain from rechecking until solved. -yoctozepto From gmann at ghanshyammann.com Sat Jul 18 18:59:31 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 18 Jul 2020 13:59:31 -0500 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy In-Reply-To: References: Message-ID: <173634bb16c.e1bb411a266116.18079246085083872@ghanshyammann.com> ---- On Sat, 18 Jul 2020 04:38:20 -0500 Radosław Piliszek wrote ---- > Morning, Folks! > > It seems the devstack glance standalone mode (the new default) is > broken at the moment if not using tls-proxy. If your jobs break on > g-api not coming up, then this is the likely case. > So far it seems to have hit Neutron and Nodepool jobs (and hence also > SDK and DIB for example). > Please refrain from rechecking until solved. Fix is merged now, you can recheck. -gmann > > -yoctozepto > > From zigo at debian.org Sat Jul 18 20:26:17 2020 From: zigo at debian.org (Thomas Goirand) Date: Sat, 18 Jul 2020 22:26:17 +0200 Subject: Floating IP's for routed networks In-Reply-To: <007d6225-12ef-69d7-6c76-45c093909297@debian.org> References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> <007d6225-12ef-69d7-6c76-45c093909297@debian.org> Message-ID: <49b2f5a0-767d-961d-9406-5b599a35d38b@debian.org> On 7/16/20 2:56 PM, Thomas Goirand wrote: > On 7/15/20 4:09 PM, Rodolfo Alonso Hernandez wrote: >> Hi Thomas: >> >> If I'm not wrong, the goal of this filtering is to remove all those >> subnets with service_type='network:routed'. Maybe you can check >> implementing an easier query: >> SELECT subnets.segment_id AS subnets_segment_id >> FROM subnets >> WHERE subnets.network_id = %(network_id_1)s AND NOT (EXISTS (SELECT * >> FROM subnet_service_types >> WHERE subnets.id = subnet_service_types.subnet_id >> AND subnet_service_types.service_type = %(service_type_1)s)) >> >> That will be translated to python as: >> >> query = test_db.context.session.query(subnet_obj.Subnet.db_model.segment_id) >> query = query.filter(subnet_obj.Subnet.db_model.network_id == network_id) >> if filtered_service_type: >> query = query.filter(~exists().where(and_( >> subnet_obj.Subnet.db_model.id == service_type_model.subnet_id, >> service_type_model.service_type == filtered_service_type))) >> >> Can you provide a UTs or a way to check the problem you are experiencing? >> >> Regards. > > Hi Rodolfo, > > Thanks for your help. > > I tried translating what you wrote above into a working code (ie: fixing > a few variables here and there), which I sent as a new PR here: > https://review.opendev.org/#/c/741429/ > > However, printing the result from SQLAlchemy shows that > get_subnet_segment_ids() still returns None together with my other 2 > subnets, so something must still be wrong. > > I'm not yet to the point I can write unit tests, just trying the code > locally for the moment. > > Cheers, > > Thomas Goirand (zigo) Rodolfo, You are right that the purpose is to filter subnets with service_type='network:routed' However, if I add: if segment_id in the: return [segment_id for (segment_id,) in query.all() if segment_id] then this doesn't work, because _validate_segment will never return 400 whenever there is a non-valid request, which defeats the purpose of this function. I removed the "if segment_id" and now the patch passes unit tests. See: https://review.opendev.org/669395 However, it's still not possible to provision a subnet with --service-type='network:routed', and at this point, I don't understand what's going wrong, and why get_subnet_segment_ids is returning None for the new subnet I'm trying to create, when this is supposed to be filtered. Is it possible that the service_type table isn't written yet at the time of the call of get_subnet_segment_ids()? I'd like to add a test, to me it looks like I should do it here: neutron/tests/unit/extensions/test_segment.py using as model: test_only_some_subnets_associated_not_allowed() by just adding service_type='network:routed', and expecting it to succeed. However, how do I add a service-type when creating the subnet? It doesn't look like this exists in this test framework. Any suggestion? Cheers, Thomas Goirand (zigo) From reza.b2008 at gmail.com Sun Jul 19 07:07:34 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Sun, 19 Jul 2020 11:37:34 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: As Ruslanas guided, the problem was solved by disabling gpgcheck. For me there was no need of enabling HA repos. I think this process should be reported as a bug. Unfortunately, now my overcloud installation fails with: ... TASK [tripleo_podman : ensure podman and deps are installed] ******************* task path: /usr/share/ansible/roles/tripleo_podman/tasks/tripleo_podman_install.yml:21 Saturday 18 July 2020 15:04:29 +0430 (0:00:00.193) 0:04:37.581 ********* Running dnf Using module file /usr/lib/python3.6/site-packages/ansible/modules/packaging/os/dnf.py ... fatal: [overcloud-controller-0]: FAILED! => changed=false failures: - No package buildah available. invocation: module_args: allow_downgrade: false autoremove: false bugfix: false conf_file: null disable_excludes: null disable_gpg_check: false disable_plugin: [] disablerepo: [] download_dir: null download_only: false enable_plugin: [] enablerepo: [] exclude: [] install_repoquery: true install_weak_deps: true installroot: / list: null lock_timeout: 30 name: - podman - buildah releasever: null security: false skip_broken: false state: latest update_cache: false update_only: false validate_certs: true msg: Failed to install some of the specified packages rc: 1 results: [] ... Do you think the above error is something related to repos? On Tue, 14 Jul 2020 at 18:20, Ruslanas Gžibovskis wrote: > I am not sure, but that might help. I use these steps for deployment: > > cp -ar /etc/yum.repos.d repos > sed -i s/gpgcheck=1/gpgcheck=0/g repos/*repo > export DIB_YUM_REPO_CONF="$(ls /home/stack/repos/*repo)" > export STABLE_RELEASE="ussuri" > export > OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml" > source /home/stack/stackrc > mkdir /home/stack/images > cd /home/stack/images > openstack overcloud image build --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml > --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > && openstack overcloud image upload --update-existing > cd /home/stack > ls /home/stack/images > > this works for all packages except: > > pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini > openstack-selinux pacemaker pcs > > to solve these you need to enable in repos dir HA repo (change in enable=0 > to enable=1 > and then this will solve you issues with most except: > osops-tools-monitoring-oschecks > > this one, you can change by: > modify line in file: > /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map > to have this line: > "oschecks_package": "sysstat" > instead of "oschecks_package": "osops-tools-monitoring-oschecks > > " > > > > > On Tue, 14 Jul 2020 at 15:14, Alex Schultz wrote: > >> On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi >> wrote: >> > >> > Thanks for your information. >> > Actually, I was in doubt of using Ussuri (latest version) for my >> environment. >> > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but >> overcloud image build got some error: >> > >> > $ openstack overcloud image build --config-file >> /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml >> --config-file >> /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml >> > >> > ... >> > 2020-07-14 12:14:22.714 | Running install-packages install. >> > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient >> python3-barbicanclient python3-cinderclient python3-designateclient >> python3-glanceclient python3-gnocchiclient python3-heatclient >> python3-ironicclient python3-keystoneclient python3-manilaclient >> python3-mistralclient python3-neutronclient python3-novaclient >> python3-openstackclient python3-pankoclient python3-saharaclient >> python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony >> pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning >> osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman >> libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch >> openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt >> ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs >> > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, >> config-manager, copr, debug, debuginfo-install, download, >> generate_completion_cache, needs-restarting, playground, repoclosure, >> repodiff, repograph, repomanage, reposync >> > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 >> > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum >> > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS >> Linux 8; generic; Linux.x86_64)' >> > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream >> > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 >> 23:25:16 2020. >> > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS >> > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 >> 23:25:12 2020. >> > 2020-07-14 12:14:23.517 | repo: using cache for: extras >> > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 >> 00:15:26 2020. >> > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago >> on Tue Jul 14 11:43:38 2020. >> > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion >> cache... >> > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient >> > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient >> > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient >> > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient >> > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient >> > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient >> > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient >> > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient >> > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient >> > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient >> > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient >> > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient >> > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient >> > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient >> > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient >> > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient >> > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient >> > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient >> > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is >> already installed. >> > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already >> installed. >> > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote >> > 2020-07-14 12:14:23.929 | No match for argument: >> osops-tools-monitoring-oschecks >> > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker >> > 2020-07-14 12:14:23.936 | No match for argument: crudini >> > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux >> > 2020-07-14 12:14:23.953 | No match for argument: pacemaker >> > 2020-07-14 12:14:23.957 | No match for argument: pcs >> > 2020-07-14 12:14:23.961 | Error: Unable to find a match: >> python3-aodhclient python3-barbicanclient python3-cinderclient >> python3-designateclient python3-glanceclient python3-gnocchiclient >> python3-heatclient python3-ironicclient python3-keystoneclient >> python3-manilaclient python3-mistralclient python3-neutronclient >> python3-novaclient python3-openstackclient python3-pankoclient >> python3-saharaclient python3-swiftclient python3-zaqarclient >> pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini >> openstack-selinux pacemaker pcs >> > >> > Do you have any idea? >> > >> >> Seems like you are missing the correct DIP_YUM_REPO_CONF setting per >> #3 from >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images >> >> > >> > >> > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: >> >> >> >> Hi folks, >> >> >> >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz >> wrote: >> >>> >> >>> I don't believe centos8 containers are available for Train yet. The >> >>> error you're hitting is because it's fetching centos7 containers and >> >>> the ironic container is not backwards compatible between the two >> >>> versions. If you want centos8, use Ussuri. >> >>> >> >> >> >> fyi we started pushing centos8 train last week - slightly different >> namespace - latest current-tripleo containers are pushed to >> https://hub.docker.com/u/tripleotraincentos8 >> >> >> >> hope it helps >> >> >> >>> >> >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi < >> reza.b2008 at gmail.com> wrote: >> >>> > >> >>> > I found following error in ironic and container-puppet-ironic >> container log during installation: >> >>> > >> >>> > puppet-user: Error: >> /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: >> Could not evaluate: Could not retrieve information from environment >> production source(s) file:/tftpboot/ldlinux.c32 >> >>> > >> >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi < >> reza.b2008 at gmail.com> wrote: >> >>> >> >> >>> >> Hi, >> >>> >> >> >>> >> I'm going to install OpenStack Train with the help of TripleO on >> CentOS 8, but undercloud installation fails with the following error: >> >>> >> >> >>> >> "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen >> 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping >> because of failed dependencies", "puppet-user: Notice: Applied catalog in >> 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: >> 97", "puppet-user: Events:", "puppet-user: Failure: 1", >> "puppet-user: Success: 97", "puppet-user: Total: 98", >> "puppet-user: Resources:", "puppet-user: Failed: 1", >> "puppet-user: Skipped: 41", "puppet-user: Changed: 97", >> "puppet-user: Out of sync: 98", "puppet-user: Total: >> 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", >> "puppet-user: Concat file: 0.00", "puppet-user: Anchor: >> 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: >> Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: >> Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", >> "puppet-user: Catalog application: 1.72", "puppet-user: Last >> run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: >> Total: 1.72", "puppet-user: Version:", "puppet-user: >> Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ >> '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit >> 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying >> running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed >> running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- >> Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 >> ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: >> 95117 -- ERROR configuring zaqar"]} >> >>> >> >> >>> >> Any suggestion would be grateful. >> >>> >> Regards, >> >>> >> Reza >> >>> >> >> >>> >> >> >>> >> >>> >> >> >> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sxmatch1986 at gmail.com Mon Jul 20 00:54:35 2020 From: sxmatch1986 at gmail.com (hao wang) Date: Mon, 20 Jul 2020 08:54:35 +0800 Subject: [Zaqar][Project-Update]Encrypted Messages in Queue is coming ! Message-ID: Hi, stackers, In Victoria, the queue in Zaqar will support to encrypt messages before storing them into storage backends, also could support to decrypt (or not) messages when those are claimed by consumer. This feature will enhance the security of the messaging service. As you may know, currently, Zaqar can't encrypt any messages and just store those messages into storage backends. That'll bring some security issues like information leakage or hacker attack. So depending on this feature, we have a basic mechanism to protect user's data from those security threats. Now we will support the encryption algorithm AES-256 at first, we consider bringing more capacities in the future like supporting RSA and letting users chose what they want to use. Finding more details in [1] and now the code patch is ready for reviewing [2]. Welcome to anyone who's interested in this new feature! [1]https://review.opendev.org/#/c/731102/4 [2]https://review.opendev.org/#/c/738736/ From katonalala at gmail.com Mon Jul 20 05:30:36 2020 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 20 Jul 2020 07:30:36 +0200 Subject: [neutron] Bug deputy report for week of July 13th Message-ID: Hi, I was Neutron bug deputy last week. A short summary of the reported bugs: - High bugs - https://bugs.launchpad.net/neutron/+bug/1887815 Neutron API - problem parsing semi-colon in Accept header - In Progress / Assigned - https://bugs.launchpad.net/neutron/+bug/1887992 [neutron-tempest-plugin] glance service failing during the installation - Fixed in Devstack - Medium bugs - https://bugs.launchpad.net/neutron/+bug/1887523 Deadlock detection code can be stale - In Progress (use oslo.b deadlock handling) => https://review.opendev.org/740977 - https://bugs.launchpad.net/neutron/+bug/1887281 [linuxbridge] ebtables delete arp protect chain fails - In Progress / Assigned => https://review.opendev.org/740588 - https://bugs.launchpad.net/neutron/+bug/1887405 Race condition while processing security_groups_member_updated events (ipset) - Unassigned - Low bugs - https://bugs.launchpad.net/neutron/+bug/1887778 neutron-ovn-migration-mtu does not support specifying project/user domain name - Assigned / In Progress => https://review.opendev.org/741410 - https://bugs.launchpad.net/neutron/+bug/1887781 neutron-ovn-migration-mtu does not adjust mtu on gre-networks - Assigned / In Progress => https://review.opendev.org/741414 - https://bugs.launchpad.net/neutron/+bug/1887385 String to byte conversion should provide the encoding type - Assigned / In progress => https://review.opendev.org/740693 - RFE - https://bugs.launchpad.net/neutron/+bug/1887497 Cleanup stale flows by cookie and table_id instead of just by cookie Only 1 is that still unassigned and some more checks are needed: https://bugs.launchpad.net/neutron/+bug/1887405 Lajos (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From isanjayk5 at gmail.com Mon Jul 20 06:40:34 2020 From: isanjayk5 at gmail.com (Sanjay K) Date: Mon, 20 Jul 2020 12:10:34 +0530 Subject: [stein][watcher] Running watcher with sample Audit templates Message-ID: Hello Watcher-dev, Would you please give any pointers on how to run watcher successfully based on my query posted at - ask.openstack.org If the metric cpu_util is deprecated in the past, how are these sample templates given in the Watcher Stein release going to work? Whether I have to do anything to replace these kinds of deprecated metrics in watcher? Thank you for your help and support. Best regards, Sanjay -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Mon Jul 20 08:04:21 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 20 Jul 2020 10:04:21 +0200 Subject: RabbitMQ consumer connection is refused when trying to read notifications In-Reply-To: References: Message-ID: <93DRDQ.XI6AYSMKLAZ8@est.tech> On Thu, Jul 16, 2020 at 14:41, Anil Jangam wrote: > Hi, > > I followed the video and the steps provided in this video link and > the consumer connection is being refused. > https://www.openstack.org/videos/summits/denver-2019/nova-versioned-notifications-the-result-of-a-3-year-journey > > /etc/nova/nova.conf file changes.. > [notifications] > notify_on_state_change=vm_state > default_level=INFO > notification_format=both > > [oslo_messaging_notifications] > driver=messagingv2 > transport_url=rabbit://guest:guest at 10.30.8.57:5672/ > topics=notification > retry=-1 > > The python consume code is as follows (followed the example provided > in the video: > transport = oslo_messaging.get_notification_transport( > cfg.CONF, url='rabbit://guest:guest at 10.30.8.57:5672/') > targets = [ > oslo_messaging.Target(topic='versioned_notifications'), > ] > Does the 'rabbit://guest:guest at 10.30.8.57:5672/' URL is a valid one in your deployment? Especially regarding the authentication. E.g. does 'sudo rabbitmqctl list_users' returns guest as a valid user? Does 'sudo rabbitmqctl authenticate_user guest guest' returns 'success'? In general if your nova deployment works (e.g. you can boot servers with nova) then your nova.conf [DEFAULT]/transport_url has a valid message bus config. And by default the notifications also uses that connection. However with [oslo_messaging_notifications]/transport_url you can redefine which message bus the notifications are emitted to. This could be important as by default (since cells v2) nova uses at least two message bus connection one between the API services and the super conductors and another for cell1 (and a separate one for each cell) between the cell conductors and and the computes in that cell[1]. So nova services emit the notification to the message bus they use therefore by default not all the notification is emitted to the same bus. In my demo I reconfigured [oslo_messaging_notifications]/transport_url for each nova service to point to the same message bus (the on at the API level) and connected the demo script to that bus. [1] https://docs.openstack.org/nova/latest/user/cellsv2-layout.html > Am I missing any other configuration in any of the services in > OpenStack? You don't need to change the configuration of other OpenStack service to make the demo script work. > > Let me know if you need any other info. > > /anil. > Cheers, gibi From thierry at openstack.org Mon Jul 20 09:50:40 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 20 Jul 2020 11:50:40 +0200 Subject: [largescale-sig] Next meeting: July 22, 8utc Message-ID: <68f03b6d-9481-79d0-ae05-95de9e2eae48@openstack.org> Hi everyone, The Large Scale SIG will have a meeting this week on Wednesday, July 22 at 8 UTC[1] in the #openstack-meeting-3 channel on IRC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200722T08 Feel free to add topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting A reminder of the TODOs we had from last meeting, in case you have time to make progress on them: - ttx to identify from the chat interested candidates from Opendev event and invite them to next meeting - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - amorin to start a thread on osarchiver proposing to land it somewhere in openstack - amorin to start a [largescale-sig] thread about his middleware ping approach, SIG members can comment if that makes sense for them Talk to you all on Wednesday, -- Thierry Carrez From balazs.gibizer at est.tech Mon Jul 20 12:17:27 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 20 Jul 2020 14:17:27 +0200 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy In-Reply-To: <173634bb16c.e1bb411a266116.18079246085083872@ghanshyammann.com> References: <173634bb16c.e1bb411a266116.18079246085083872@ghanshyammann.com> Message-ID: <3TORDQ.N1AI35NQ52T1@est.tech> On Sat, Jul 18, 2020 at 13:59, Ghanshyam Mann wrote: > ---- On Sat, 18 Jul 2020 04:38:20 -0500 Radosław Piliszek > wrote ---- > > Morning, Folks! > > > > It seems the devstack glance standalone mode (the new default) is > > broken at the moment if not using tls-proxy. If your jobs break on > > g-api not coming up, then this is the likely case. > > So far it seems to have hit Neutron and Nodepool jobs (and hence > also > > SDK and DIB for example). > > Please refrain from rechecking until solved. > > Fix is merged now, you can recheck. Does this fix fixed the grenade jobs or we need to merge some backports? I see a glance related grenade failure on nova master[1][2] pretty constantly. [1] https://review.opendev.org/#/c/728481/ [2] https://review.opendev.org/#/c/673341 Cheers, gibi > > -gmann > > > > > -yoctozepto > > > > > From juliaashleykreger at gmail.com Mon Jul 20 12:40:45 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 20 Jul 2020 05:40:45 -0700 Subject: [ironic] Cancelling meeting for this week Message-ID: Greetings everyone, Sorry for the late notice, but we have OpenDev[0] today which overlaps with our weekly meeting. Since the topic of OpenDev this week is on Hardware Automation, I anticipate most of us will be joining OpenDev this morning gathering new requirements and driving the discussion forward. Thanks everyone! Again, sorry for the late notice. -Julia -- [0] https://www.openstack.org/events/opendev-2020/opendev-schedule-2 From jasowang at redhat.com Mon Jul 20 03:41:47 2020 From: jasowang at redhat.com (Jason Wang) Date: Mon, 20 Jul 2020 11:41:47 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717101258.65555978@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> Message-ID: <95c13c9b-daab-469b-f244-a0f741f1b41d@redhat.com> On 2020/7/18 上午12:12, Alex Williamson wrote: > On Thu, 16 Jul 2020 16:32:30 +0800 > Yan Zhao wrote: > >> On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: >>> On 2020/7/14 上午7:29, Yan Zhao wrote: >>>> hi folks, >>>> we are defining a device migration compatibility interface that helps upper >>>> layer stack like openstack/ovirt/libvirt to check if two devices are >>>> live migration compatible. >>>> The "devices" here could be MDEVs, physical devices, or hybrid of the two. >>>> e.g. we could use it to check whether >>>> - a src MDEV can migrate to a target MDEV, >>>> - a src VF in SRIOV can migrate to a target VF in SRIOV, >>>> - a src MDEV can migration to a target VF in SRIOV. >>>> (e.g. SIOV/SRIOV backward compatibility case) >>>> >>>> The upper layer stack could use this interface as the last step to check >>>> if one device is able to migrate to another device before triggering a real >>>> live migration procedure. >>>> we are not sure if this interface is of value or help to you. please don't >>>> hesitate to drop your valuable comments. >>>> >>>> >>>> (1) interface definition >>>> The interface is defined in below way: >>>> >>>> __ userspace >>>> /\ \ >>>> / \write >>>> / read \ >>>> ________/__________ ___\|/_____________ >>>> | migration_version | | migration_version |-->check migration >>>> --------------------- --------------------- compatibility >>>> device A device B >>>> >>>> >>>> a device attribute named migration_version is defined under each device's >>>> sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). >>> >>> Are you aware of the devlink based device management interface that is >>> proposed upstream? I think it has many advantages over sysfs, do you >>> consider to switch to that? > > Advantages, such as? My understanding for devlink(netlink) over sysfs (some are mentioned at the time of vDPA sysfs mgmt API discussion) are: - existing users (NIC, crypto, SCSI, ib), mature and stable - much better error reporting (ext_ack other than string or errno) - namespace aware - do not couple with kobject Thanks From smooney at redhat.com Mon Jul 20 10:39:14 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 20 Jul 2020 11:39:14 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <95c13c9b-daab-469b-f244-a0f741f1b41d@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <95c13c9b-daab-469b-f244-a0f741f1b41d@redhat.com> Message-ID: <60d5c1609aaef72f2877aaacff04dc7187e4b3a5.camel@redhat.com> On Mon, 2020-07-20 at 11:41 +0800, Jason Wang wrote: > On 2020/7/18 上午12:12, Alex Williamson wrote: > > On Thu, 16 Jul 2020 16:32:30 +0800 > > Yan Zhao wrote: > > > > > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > > > Are you aware of the devlink based device management interface that is > > > > proposed upstream? I think it has many advantages over sysfs, do you > > > > consider to switch to that? > > > > Advantages, such as? > > > My understanding for devlink(netlink) over sysfs (some are mentioned at > the time of vDPA sysfs mgmt API discussion) are: i tought netlink was used more a as a configuration protocoal to qurry and confire nic and i guess other devices in its devlink form requireint a tool to be witten that can speak the protocal to interact with. the primary advantate of sysfs is that everything is just a file. there are no addtional depleenceis needed and unlike netlink there are not interoperatblity issues in a coanitnerised env. if you are using diffrenet version of libc and gcc in the contaienr vs the host my understanding is tools like ethtool from ubuntu deployed in a container on a centos host can have issue communicating with the host kernel. if its jsut a file unless the format the data is returnin in chagnes or the layout of sysfs changes its compatiable regardless of what you use to read it. > > - existing users (NIC, crypto, SCSI, ib), mature and stable > - much better error reporting (ext_ack other than string or errno) > - namespace aware > - do not couple with kobject > > Thanks > From balazs.gibizer at est.tech Mon Jul 20 13:49:19 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 20 Jul 2020 15:49:19 +0200 Subject: [nova] spec review day on Tuesday 07-21 Message-ID: <72TRDQ.VXPYSS34F7K5@est.tech> Hi, As the spec freeze will happen next week at Milestone 2 we agreed[1] to have a dedicated spec review day on Tuesday (07-21). If you are a spec author then please prepare to react on the incoming feedback on your open spec. If you are reviewer then please focus on reviewing open specs during the day. Cheers, gibi [1] http://eavesdrop.openstack.org/meetings/nova/2020/nova.2020-07-16-16.00.log.html#l-40 From emilien at redhat.com Mon Jul 20 14:22:32 2020 From: emilien at redhat.com (Emilien Macchi) Date: Mon, 20 Jul 2020 10:22:32 -0400 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: I added Rabi to the tripleo-core group. Thanks all for your feedback, And again thanks Rabi for your hard work! On Wed, Jul 15, 2020 at 4:17 PM Brent Eagles wrote: > +1 definitely! > > On Tue, Jul 14, 2020 at 11:03 AM Emilien Macchi > wrote: > >> Hi folks, >> >> Rabi has proved deep technical understanding on the TripleO components >> over the last years. >> Initially as a major maintainer of the Heat project and then a regular >> contributor to TripleO, he got involved at different levels: >> - Optimization of the Heat templates, to reduce the number of resources >> or improve them to make it faster and more efficient at scale. >> - Migration of the Mistral workflows into native Ansible modules and >> Python code into tripleo-common, with end-to-end expertise. >> - Regular contributions to the container tooling integration. >> >> Being involved on the mailing-list and IRC channels, Rabi is always >> helpful to the community and here to help. >> He has provided thorough reviews in principal components on TripleO as >> well as a lot of bug fixes or new features; which contributed to make >> TripleO more stable and scalable. I would like to propose him be part of >> the TripleO core team. >> >> Thanks Rabi for your hard work! >> -- >> Emilien Macchi >> > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Mon Jul 20 14:27:19 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 20 Jul 2020 09:27:19 -0500 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy In-Reply-To: <3TORDQ.N1AI35NQ52T1@est.tech> References: <173634bb16c.e1bb411a266116.18079246085083872@ghanshyammann.com> <3TORDQ.N1AI35NQ52T1@est.tech> Message-ID: <1736c9f3225.129463c0e315641.999481687173104905@ghanshyammann.com> ---- On Mon, 20 Jul 2020 07:17:27 -0500 Balázs_Gibizer_ wrote ---- > > > On Sat, Jul 18, 2020 at 13:59, Ghanshyam Mann > wrote: > > ---- On Sat, 18 Jul 2020 04:38:20 -0500 Radosław Piliszek > > wrote ---- > > > Morning, Folks! > > > > > > It seems the devstack glance standalone mode (the new default) is > > > broken at the moment if not using tls-proxy. If your jobs break on > > > g-api not coming up, then this is the likely case. > > > So far it seems to have hit Neutron and Nodepool jobs (and hence > > also > > > SDK and DIB for example). > > > Please refrain from rechecking until solved. > > > > Fix is merged now, you can recheck. > > Does this fix fixed the grenade jobs or we need to merge some > backports? I see a glance related grenade failure on nova master[1][2] > pretty constantly. Yeah, legacy jobs are running with glance in standalone mode, zuulv3 native jobs were taken care in advance before devstack moved glance as standalone by default. Pushed the d-g patch - https://review.opendev.org/#/c/741955/ -gmann > > [1] https://review.opendev.org/#/c/728481/ > [2] https://review.opendev.org/#/c/673341 > > Cheers, > gibi > > > > > -gmann > > > > > > > > -yoctozepto > > > > > > > > > > > > From balazs.gibizer at est.tech Mon Jul 20 14:59:07 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 20 Jul 2020 16:59:07 +0200 Subject: [nova] Nova Gate is broken due to novnc issue in nova-next job Message-ID: Hi, Since Jul 18 the nova-next gate job is broken due to a novnc issue[1]. Hold your re-checks. Cheers, gibi [1] https://bugs.launchpad.net/nova/+bug/1888237 From balazs.gibizer at est.tech Mon Jul 20 16:35:28 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Mon, 20 Jul 2020 18:35:28 +0200 Subject: [nova] Nova Gate is broken due to novnc issue in nova-next job In-Reply-To: References: Message-ID: <4R0SDQ.J8CZ47O0J88S2@est.tech> On Mon, Jul 20, 2020 at 16:59, Balázs Gibizer wrote: > Hi, > > Since Jul 18 the nova-next gate job is broken due to a novnc > issue[1]. Hold your re-checks. A probable fix has been pushed [2]. Please still hold your rechecks until the fix merges. Cheers, gibi > > Cheers, > gibi > > [1] https://bugs.launchpad.net/nova/+bug/1888237 [2] https://review.opendev.org/#/c/741986/ > > > From miguel at mlavalle.com Mon Jul 20 16:55:54 2020 From: miguel at mlavalle.com (Miguel Lavalle) Date: Mon, 20 Jul 2020 11:55:54 -0500 Subject: [neutron] Cancelling Monday July 20th weekly team IRC meeting Message-ID: Dear Neutrinos, I have an agenda conflict so I won't be able to run the weekly team IRC meeting. Let's skip for this week Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Mon Jul 20 16:59:33 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 20 Jul 2020 22:29:33 +0530 Subject: [glance] issues with glance functional tests at gate Message-ID: Hello All, It seems that something merged/released during the past weekend (late Friday) causing failure of glance functional tests. We are working on fixing the same, till then please refrain from adding recheck on the patches. Thank you, Abhishek Kekane -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Mon Jul 20 18:33:50 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Mon, 20 Jul 2020 12:33:50 -0600 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: \0/ On Mon, Jul 20, 2020 at 8:24 AM Emilien Macchi wrote: > I added Rabi to the tripleo-core group. > Thanks all for your feedback, > > And again thanks Rabi for your hard work! > > On Wed, Jul 15, 2020 at 4:17 PM Brent Eagles wrote: > >> +1 definitely! >> >> On Tue, Jul 14, 2020 at 11:03 AM Emilien Macchi >> wrote: >> >>> Hi folks, >>> >>> Rabi has proved deep technical understanding on the TripleO components >>> over the last years. >>> Initially as a major maintainer of the Heat project and then a regular >>> contributor to TripleO, he got involved at different levels: >>> - Optimization of the Heat templates, to reduce the number of resources >>> or improve them to make it faster and more efficient at scale. >>> - Migration of the Mistral workflows into native Ansible modules and >>> Python code into tripleo-common, with end-to-end expertise. >>> - Regular contributions to the container tooling integration. >>> >>> Being involved on the mailing-list and IRC channels, Rabi is always >>> helpful to the community and here to help. >>> He has provided thorough reviews in principal components on TripleO as >>> well as a lot of bug fixes or new features; which contributed to make >>> TripleO more stable and scalable. I would like to propose him be part of >>> the TripleO core team. >>> >>> Thanks Rabi for your hard work! >>> -- >>> Emilien Macchi >>> >> > > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Mon Jul 20 20:26:07 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Tue, 21 Jul 2020 01:56:07 +0530 Subject: [glance] issues with glance functional tests at gate In-Reply-To: References: Message-ID: Hi All, Fix is up https://review.opendev.org/#/c/742022 Will update once it is merged. Thanks & Best Regards, Abhishek Kekane On Mon, Jul 20, 2020 at 10:29 PM Abhishek Kekane wrote: > Hello All, > > It seems that something merged/released during the past weekend (late > Friday) causing failure of glance functional tests. > > We are working on fixing the same, till then please refrain from adding > recheck on the patches. > > Thank you, > > Abhishek Kekane > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Tue Jul 21 03:04:55 2020 From: melwittt at gmail.com (melanie witt) Date: Mon, 20 Jul 2020 20:04:55 -0700 Subject: [nova] Nova Gate is broken due to novnc issue in nova-next job In-Reply-To: <4R0SDQ.J8CZ47O0J88S2@est.tech> References: <4R0SDQ.J8CZ47O0J88S2@est.tech> Message-ID: On 7/20/20 09:35, Balázs Gibizer wrote: > > > On Mon, Jul 20, 2020 at 16:59, Balázs Gibizer > wrote: >> Hi, >> >> Since Jul 18 the nova-next gate job is broken due to a novnc issue[1]. >> Hold your re-checks. > > A probable fix has been pushed [2]. Please still hold your rechecks > until the fix merges. Fix [2] has merged, it is now OK to recheck your changes. Cheers, -melanie >> [1] https://bugs.launchpad.net/nova/+bug/1888237 > > [2] https://review.opendev.org/#/c/741986/ From ramishra at redhat.com Tue Jul 21 04:31:18 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Tue, 21 Jul 2020 10:01:18 +0530 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: On Mon, Jul 20, 2020 at 7:58 PM Emilien Macchi wrote: > I added Rabi to the tripleo-core group. > Thanks all for your feedback, > > And again thanks Rabi for your hard work! > > Thanks Emilien and others for your generous words and feedback! > On Wed, Jul 15, 2020 at 4:17 PM Brent Eagles wrote: > >> +1 definitely! >> >> On Tue, Jul 14, 2020 at 11:03 AM Emilien Macchi >> wrote: >> >>> Hi folks, >>> >>> Rabi has proved deep technical understanding on the TripleO components >>> over the last years. >>> Initially as a major maintainer of the Heat project and then a regular >>> contributor to TripleO, he got involved at different levels: >>> - Optimization of the Heat templates, to reduce the number of resources >>> or improve them to make it faster and more efficient at scale. >>> - Migration of the Mistral workflows into native Ansible modules and >>> Python code into tripleo-common, with end-to-end expertise. >>> - Regular contributions to the container tooling integration. >>> >>> Being involved on the mailing-list and IRC channels, Rabi is always >>> helpful to the community and here to help. >>> He has provided thorough reviews in principal components on TripleO as >>> well as a lot of bug fixes or new features; which contributed to make >>> TripleO more stable and scalable. I would like to propose him be part of >>> the TripleO core team. >>> >>> Thanks Rabi for your hard work! >>> -- >>> Emilien Macchi >>> >> > > -- > Emilien Macchi > -- Regards, Rabi Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykarel at redhat.com Tue Jul 21 08:15:23 2020 From: ykarel at redhat.com (Yatin Karel) Date: Tue, 21 Jul 2020 13:45:23 +0530 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: Hi, On Sun, Jul 19, 2020 at 12:41 PM Reza Bakhshayeshi wrote: > > As Ruslanas guided, the problem was solved by disabling gpgcheck. For me there was no need of enabling HA repos. > I think this process should be reported as a bug. > > Unfortunately, now my overcloud installation fails with: > > ... > TASK [tripleo_podman : ensure podman and deps are installed] ******************* > task path: /usr/share/ansible/roles/tripleo_podman/tasks/tripleo_podman_install.yml:21 > Saturday 18 July 2020 15:04:29 +0430 (0:00:00.193) 0:04:37.581 ********* > Running dnf > Using module file /usr/lib/python3.6/site-packages/ansible/modules/packaging/os/dnf.py > ... > fatal: [overcloud-controller-0]: FAILED! => changed=false > failures: > - No package buildah available. > invocation: > module_args: > allow_downgrade: false > autoremove: false > bugfix: false > conf_file: null > disable_excludes: null > disable_gpg_check: false > disable_plugin: [] > disablerepo: [] > download_dir: null > download_only: false > enable_plugin: [] > enablerepo: [] > exclude: [] > install_repoquery: true > install_weak_deps: true > installroot: / > list: null > lock_timeout: 30 > name: > - podman > - buildah > releasever: null > security: false > skip_broken: false > state: latest > update_cache: false > update_only: false > validate_certs: true > msg: Failed to install some of the specified packages > rc: 1 > results: [] > ... > > Do you think the above error is something related to repos? The issue can happen when repos are not configured on overcloud nodes, but in this particular case buildah is not needed on overcloud nodes, which is fixed already[1], can u try again with latest repos. [1] https://review.opendev.org/#/q/Ibb91dfa9684b481dea34607fc47c0d531d56ee45 > > On Tue, 14 Jul 2020 at 18:20, Ruslanas Gžibovskis wrote: >> >> I am not sure, but that might help. I use these steps for deployment: >> >> cp -ar /etc/yum.repos.d repos >> sed -i s/gpgcheck=1/gpgcheck=0/g repos/*repo >> export DIB_YUM_REPO_CONF="$(ls /home/stack/repos/*repo)" >> export STABLE_RELEASE="ussuri" >> export OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml" >> source /home/stack/stackrc >> mkdir /home/stack/images >> cd /home/stack/images >> openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml && openstack overcloud image upload --update-existing >> cd /home/stack >> ls /home/stack/images >> >> this works for all packages except: >> >> pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs >> >> to solve these you need to enable in repos dir HA repo (change in enable=0 to enable=1 >> and then this will solve you issues with most except: osops-tools-monitoring-oschecks >> >> this one, you can change by: >> modify line in file: >> /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map >> to have this line: >> "oschecks_package": "sysstat" >> instead of "oschecks_package": "osops-tools-monitoring-oschecks >> >> " >> >> >> >> >> On Tue, 14 Jul 2020 at 15:14, Alex Schultz wrote: >>> >>> On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi wrote: >>> > >>> > Thanks for your information. >>> > Actually, I was in doubt of using Ussuri (latest version) for my environment. >>> > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but overcloud image build got some error: >>> > >>> > $ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml >>> > >>> > ... >>> > 2020-07-14 12:14:22.714 | Running install-packages install. >>> > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs >>> > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync >>> > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 >>> > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum >>> > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux 8; generic; Linux.x86_64)' >>> > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream >>> > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 23:25:16 2020. >>> > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS >>> > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 23:25:12 2020. >>> > 2020-07-14 12:14:23.517 | repo: using cache for: extras >>> > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 00:15:26 2020. >>> > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on Tue Jul 14 11:43:38 2020. >>> > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion cache... >>> > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient >>> > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient >>> > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient >>> > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient >>> > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient >>> > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient >>> > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient >>> > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient >>> > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient >>> > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient >>> > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient >>> > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient >>> > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient >>> > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient >>> > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient >>> > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient >>> > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient >>> > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient >>> > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is already installed. >>> > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already installed. >>> > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote >>> > 2020-07-14 12:14:23.929 | No match for argument: osops-tools-monitoring-oschecks >>> > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker >>> > 2020-07-14 12:14:23.936 | No match for argument: crudini >>> > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux >>> > 2020-07-14 12:14:23.953 | No match for argument: pacemaker >>> > 2020-07-14 12:14:23.957 | No match for argument: pcs >>> > 2020-07-14 12:14:23.961 | Error: Unable to find a match: python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs >>> > >>> > Do you have any idea? >>> > >>> >>> Seems like you are missing the correct DIP_YUM_REPO_CONF setting per >>> #3 from https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images >>> >>> > >>> > >>> > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: >>> >> >>> >> Hi folks, >>> >> >>> >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: >>> >>> >>> >>> I don't believe centos8 containers are available for Train yet. The >>> >>> error you're hitting is because it's fetching centos7 containers and >>> >>> the ironic container is not backwards compatible between the two >>> >>> versions. If you want centos8, use Ussuri. >>> >>> >>> >> >>> >> fyi we started pushing centos8 train last week - slightly different namespace - latest current-tripleo containers are pushed to https://hub.docker.com/u/tripleotraincentos8 >>> >> >>> >> hope it helps >>> >> >>> >>> >>> >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi wrote: >>> >>> > >>> >>> > I found following error in ironic and container-puppet-ironic container log during installation: >>> >>> > >>> >>> > puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 >>> >>> > >>> >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: >>> >>> >> >>> >>> >> Hi, >>> >>> >> >>> >>> >> I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: >>> >>> >> >>> >>> >> "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} >>> >>> >> >>> >>> >> Any suggestion would be grateful. >>> >>> >> Regards, >>> >>> >> Reza >>> >>> >> >>> >>> >> >>> >>> >>> >>> >>> >>> >> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 Thanks and Regards Yatin Karel From jpena at redhat.com Tue Jul 21 08:25:24 2020 From: jpena at redhat.com (Javier Pena) Date: Tue, 21 Jul 2020 04:25:24 -0400 (EDT) Subject: [infra] CentOS support for mirror role in system-config In-Reply-To: <245857746.42287229.1595319355188.JavaMail.zimbra@redhat.com> Message-ID: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> Hi all, TL;DR: I have proposed a set of changes to add CentOS support to the mirror role in system-config with [1] and would appreciate reviews. Long version: the RDO project maintains a set of mirrors that mimic those provided by the OpenDev Infra team to jobs running in review.opendev.org. The reason for this is to provide the same environment for TripleO jobs to run on both OpenDev's and RDO's Gerrit platforms. Previously, we used the Puppet modules from the system-config repo, together with some unmerged changes to move that support to puppet-openstackci, as it was suggested during the review process [2]. Once those modules were obsoleted, we have proposed a set of changes to the mirror ansible role [1] to add that support. I would appreciate reviews on those changes (thanks Ian for the first reviews!). Some of them are small bugfixes to fix the already existing CentOS support, while [3] is the one targeting the mirror role. Thanks, Javier [1] - https://review.opendev.org/#/q/status:open+project:opendev/system-config+branch:master+topic:mirror-centos [2] - https://review.opendev.org/#/q/status:open+project:opendev/puppet-openstackci+branch:master+topic:afs-mirror-centos [3] - https://review.opendev.org/736996 From anlin.kong at gmail.com Tue Jul 21 10:19:57 2020 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 21 Jul 2020 22:19:57 +1200 Subject: [nova] spec review day on Tuesday 07-21 In-Reply-To: <72TRDQ.VXPYSS34F7K5@est.tech> References: <72TRDQ.VXPYSS34F7K5@est.tech> Message-ID: Hi Balázs, May I ask why in Nova even user with admin role can't create VM using other user's Neutron port? The use case is in some services like Octavia and Trove, the service tenant user would like to create Service VMs (which are invisible to end users) but using the end user's openstack resources such as Neutron ports, giving the end user ability to customize the security group rules, e.g. allow IP addresses to access the load balancing service or database service. --- Lingxian Kong Senior Software Engineer Catalyst Cloud www.catalystcloud.nz On Tue, Jul 21, 2020 at 1:54 AM Balázs Gibizer wrote: > Hi, > > As the spec freeze will happen next week at Milestone 2 we agreed[1] to > have a dedicated spec review day on Tuesday (07-21). If you are a spec > author then please prepare to react on the incoming feedback on your > open spec. If you are reviewer then please focus on reviewing open > specs during the day. > > Cheers, > gibi > > [1] > > http://eavesdrop.openstack.org/meetings/nova/2020/nova.2020-07-16-16.00.log.html#l-40 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.dibbo at stfc.ac.uk Tue Jul 21 12:53:04 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Tue, 21 Jul 2020 12:53:04 +0000 Subject: Magnum: invalid format of client version Message-ID: Hi, I have just deployed magnum into my train enviroment and am seeing the following error when creating any kind of cluster: This is a Train environment deployed from RDO packages (9.4.0-1). Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server [req-aa9ce18b-64eb-40ad-b1c0-b7c312402780 - - - - -] Exception during message handling: InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 160, in wrapper Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 95, in cluster_create Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server raise e Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server #033[00m When logging in debug, I see a dump of a huge heat template and a 400 bad request from heatclient (as below) immediately before the above log excerpt: {"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "UnsupportedVersion: : resources.master_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /usr/lib/python2.7/site-packages/heatclient/common/http.py:157 More details are available in my question here: https://ask.openstack.org/en/question/128520/magnum-invalid-format-of-client-version/ Any suggestions on where to look to set the client version it is complaining about would be much appreciated? Thanks Alex Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant at civo.com Tue Jul 21 13:04:28 2020 From: grant at civo.com (Grant Morley) Date: Tue, 21 Jul 2020 14:04:28 +0100 Subject: CentOS unrecoverable after Ceph issues Message-ID: <344ac601-6896-8fee-d1f9-98e7ea93e801@civo.com> Hi all, We recently had an issue with our ceph cluster which ended up going into "Error" status after some drive failures. The system stopped allowing writes for a while whilst it recovered. The ceph cluster is healthy again but we seem to have a few instances that have corrupt filesystems on them. They are all CentOS 7 instances. We have got them into rescue mode to try and repair the FS with "xfs_repair -L" However when we do that we get this: 973.026283] XFS (vdb1): Mounting V5 Filesystem [ 973.203261] blk_update_request: I/O error, dev vdb, sector 8389693 [ 973.204746] blk_update_request: I/O error, dev vdb, sector 8390717 [ 973.206136] blk_update_request: I/O error, dev vdb, sector 8391741 [ 973.207608] blk_update_request: I/O error, dev vdb, sector 8392765 [ 973.209544] XFS (vdb1): xfs_do_force_shutdown(0x1) called from line 1236 of file fs/xfs/xfs_buf.c. Return address = 0xffffffffc017a50c [ 973.212137] XFS (vdb1): I/O Error Detected. Shutting down filesystem [ 973.213429] XFS (vdb1): Please umount the filesystem and rectify the problem(s) [ 973.215036] XFS (vdb1): metadata I/O error: block 0x7ffc3d ("xlog_bwrite") error 5 numblks 8192 [ 973.217201] XFS (vdb1): failed to locate log tail [ 973.218239] XFS (vdb1): log mount/recovery failed: error -5 [ 973.219865] XFS (vdb1): log mount failed [ 973.233792] blk_update_request: I/O error, dev vdb, sector 0 Interestingly any debian based instances we could recover. It just seems to be CentOS and having XFS on CentOS and ceph the instances don't seem happy. This seems more low level to me in ceph rather than a corrupt FS on a guest. Does anyone know of any "ceph tricks" that we can use to try and at least get an "xfs_repair" running? Many thanks, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Tue Jul 21 00:51:13 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Tue, 21 Jul 2020 08:51:13 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717101258.65555978@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> Message-ID: <20200721005113.GA10502@joy-OptiPlex-7040> On Fri, Jul 17, 2020 at 10:12:58AM -0600, Alex Williamson wrote: <...> > > yes, in another reply, Alex proposed to use an interface in json format. > > I guess we can define something like > > > > { "self" : > > [ > > { "pciid" : "8086591d", > > "driver" : "i915", > > "gvt-version" : "v1", > > "mdev_type" : "i915-GVTg_V5_2", > > "aggregator" : "1", > > "pv-mode" : "none", > > } > > ], > > "compatible" : > > [ > > { "pciid" : "8086591d", > > "driver" : "i915", > > "gvt-version" : "v1", > > "mdev_type" : "i915-GVTg_V5_2", > > "aggregator" : "1" > > "pv-mode" : "none", > > }, > > { "pciid" : "8086591d", > > "driver" : "i915", > > "gvt-version" : "v1", > > "mdev_type" : "i915-GVTg_V5_4", > > "aggregator" : "2" > > "pv-mode" : "none", > > }, > > { "pciid" : "8086591d", > > "driver" : "i915", > > "gvt-version" : "v2", > > "mdev_type" : "i915-GVTg_V5_4", > > "aggregator" : "2" > > "pv-mode" : "none, ppgtt, context", > > } > > ... > > ] > > } > > > > But as those fields are mostly vendor specific, the userspace can > > only do simple string comparing, I guess the list would be very long as > > it needs to enumerate all possible targets. > > > This ignores so much of what I tried to achieve in my example :( > sorry, I just was eager to show and confirm the way to list all compatible combination of mdev_type and mdev attributes. > > > also, in some fileds like "gvt-version", is there a simple way to express > > things like v2+? > > > That's not a reasonable thing to express anyway, how can you be certain > that v3 won't break compatibility with v2? Sean proposed a versioning > scheme that accounts for this, using an x.y.z version expressing the > major, minor, and bugfix versions, where there is no compatibility > across major versions, minor versions have forward compatibility (ex. 1 > -> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some > degree of internal improvement that is not visible to the user in terms > of features or compatibility, but provides a basis for preferring > equally compatible candidates. > right. if self version is v1, it can't know its compatible version is v2. it can only be done in reverse. i.e. when self version is v2, it can list its compatible version is v1 and v2. and maybe later when self version is v3, there's no v1 in its compatible list. In this way, do you think we still need the complex x.y.z versioning scheme? > > > If the userspace can read this interface both in src and target and > > check whether both src and target are in corresponding compatible list, I > > think it will work for us. > > > > But still, kernel should not rely on userspace's choice, the opaque > > compatibility string is still required in kernel. No matter whether > > it would be exposed to userspace as an compatibility checking interface, > > vendor driver would keep this part of code and embed the string into the > > migration stream. so exposing it as an interface to be used by libvirt to > > do a safety check before a real live migration is only about enabling > > the kernel part of check to happen ahead. > > As you indicate, the vendor driver is responsible for checking version > information embedded within the migration stream. Therefore a > migration should fail early if the devices are incompatible. Is it but as I know, currently in VFIO migration protocol, we have no way to get vendor specific compatibility checking string in migration setup stage (i.e. .save_setup stage) before the device is set to _SAVING state. In this way, for devices who does not save device data in precopy stage, the migration compatibility checking is as late as in stop-and-copy stage, which is too late. do you think we need to add the getting/checking of vendor specific compatibility string early in save_setup stage? > really libvirt's place to second guess what it has been directed to do? if libvirt uses the scheme of reading compatibility string at source and writing for checking at the target, it can not be called "a second guess". It's not a guess, but a confirmation. > Why would we even proceed to design a user parse-able version interface > if we still have a dependency on an opaque interface? Thanks, one reason is that libvirt can't trust the parsing result from openstack. Another reason is that libvirt can use this opaque interface easier than another parsing by itself, in the fact that it would not introduce more burden to kernel who would write this part of code anyway, no matter libvirt uses it or not. Thanks Yan From jasowang at redhat.com Tue Jul 21 02:11:24 2020 From: jasowang at redhat.com (Jason Wang) Date: Tue, 21 Jul 2020 10:11:24 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <60d5c1609aaef72f2877aaacff04dc7187e4b3a5.camel@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <95c13c9b-daab-469b-f244-a0f741f1b41d@redhat.com> <60d5c1609aaef72f2877aaacff04dc7187e4b3a5.camel@redhat.com> Message-ID: <22599bc3-cb22-7a62-d463-9a53714def57@redhat.com> On 2020/7/20 下午6:39, Sean Mooney wrote: > On Mon, 2020-07-20 at 11:41 +0800, Jason Wang wrote: >> On 2020/7/18 上午12:12, Alex Williamson wrote: >>> On Thu, 16 Jul 2020 16:32:30 +0800 >>> Yan Zhao wrote: >>> >>>> On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: >>>>> On 2020/7/14 上午7:29, Yan Zhao wrote: >>>>>> hi folks, >>>>>> we are defining a device migration compatibility interface that helps upper >>>>>> layer stack like openstack/ovirt/libvirt to check if two devices are >>>>>> live migration compatible. >>>>>> The "devices" here could be MDEVs, physical devices, or hybrid of the two. >>>>>> e.g. we could use it to check whether >>>>>> - a src MDEV can migrate to a target MDEV, >>>>>> - a src VF in SRIOV can migrate to a target VF in SRIOV, >>>>>> - a src MDEV can migration to a target VF in SRIOV. >>>>>> (e.g. SIOV/SRIOV backward compatibility case) >>>>>> >>>>>> The upper layer stack could use this interface as the last step to check >>>>>> if one device is able to migrate to another device before triggering a real >>>>>> live migration procedure. >>>>>> we are not sure if this interface is of value or help to you. please don't >>>>>> hesitate to drop your valuable comments. >>>>>> >>>>>> >>>>>> (1) interface definition >>>>>> The interface is defined in below way: >>>>>> >>>>>> __ userspace >>>>>> /\ \ >>>>>> / \write >>>>>> / read \ >>>>>> ________/__________ ___\|/_____________ >>>>>> | migration_version | | migration_version |-->check migration >>>>>> --------------------- --------------------- compatibility >>>>>> device A device B >>>>>> >>>>>> >>>>>> a device attribute named migration_version is defined under each device's >>>>>> sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). >>>>> Are you aware of the devlink based device management interface that is >>>>> proposed upstream? I think it has many advantages over sysfs, do you >>>>> consider to switch to that? >>> Advantages, such as? >> >> My understanding for devlink(netlink) over sysfs (some are mentioned at >> the time of vDPA sysfs mgmt API discussion) are: > i tought netlink was used more a as a configuration protocoal to qurry and confire nic and i guess > other devices in its devlink form requireint a tool to be witten that can speak the protocal to interact with. > the primary advantate of sysfs is that everything is just a file. there are no addtional depleenceis > needed Well, if you try to build logic like introspection on top for a sophisticated hardware, you probably need to have library on top. And it's attribute per file is pretty inefficient. > and unlike netlink there are not interoperatblity issues in a coanitnerised env. if you are using diffrenet > version of libc and gcc in the contaienr vs the host my understanding is tools like ethtool from ubuntu deployed > in a container on a centos host can have issue communicating with the host kernel. Kernel provides stable ABI for userspace, so it's not something that we can't fix. > if its jsut a file unless > the format the data is returnin in chagnes or the layout of sysfs changes its compatiable regardless of what you > use to read it. I believe you can't change sysfs layout which is part of uABI. But as I mentioned below, sysfs has several drawbacks. It's not harm to compare between different approach when you start a new device management API. Thanks >> - existing users (NIC, crypto, SCSI, ib), mature and stable >> - much better error reporting (ext_ack other than string or errno) >> - namespace aware >> - do not couple with kobject >> >> Thanks >> From alexander.dibbo at stfc.ac.uk Tue Jul 21 07:55:51 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Tue, 21 Jul 2020 07:55:51 +0000 Subject: Magnum: invalid format of client version Message-ID: <84b00bab224a40d883f46545042b236b@stfc.ac.uk> Hi, I have just deployed magnum into my train enviroment and am seeing the following error when creating any kind of cluster: This is a Train environment deployed from RDO packages (9.4.0-1). Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server [req-aa9ce18b-64eb-40ad-b1c0-b7c312402780 - - - - -] Exception during message handling: InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 160, in wrapper Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 95, in cluster_create Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server raise e Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server #033[00m When logging in debug, I see a dump of a huge heat template and a 400 bad request from heatclient (as below) immediately before the above log excerpt: {"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "UnsupportedVersion: : resources.master_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /usr/lib/python2.7/site-packages/heatclient/common/http.py:157 Here is my config file: [DEFAULT] auth_strategy = keystone debug = true memcached_servers = dev-service1.nubes.rl.ac.uk:11211 my_ip = 172.16.103.43 rpc_backend = rabbit stack_domain_admin = magnum stack_domain_admin_password = MAGNUM_PASS stack_user_domain_name = magnum transport_url = rabbit://openstack:rabbit-pass at dev-rabbit4.nubes.rl.ac.uk:5672,openstack:rabbit-pass at dev-rabbit5.nubes.rl.ac.uk:5672,openstack:rabbit-pass at dev-rabbit6.nubes.rl.ac.uk:5672/ verbose = true [api] host = 172.16.103.43 [barbican_client] endpoint_type = public region_name = RegionOne [cache] backend = oslo_cache.memcache_pool enabled = true memcache_servers = dev-service1.nubes.rl.ac.uk:11211 [certificates] cert_manager_type = x509keypair [cinder_client] endpoint_type = public region_name = RegionOne [database] connection = mysql+pymysql://magnum:MAGNUM_DBPASS at dev-openstack.stfc.ac.uk:3306/magnum connection_recycle_time = 3600 [glance_client] endpoint_type = public region_name = RegionOne [heat_client] endpoint_type = public region_name = RegionOne [keystone_authtoken] admin_password = MAGNUM_PASS admin_tenant_name = service admin_user = magnum auth_plugin = password auth_type = password auth_uri = https://dev-openstack.stfc.ac.uk:5000 auth_url = https://dev-openstack.stfc.ac.uk:5000 insecure = false password = MAGNUM_PASS project_domain_name = default project_name = service user_domain_name = default username = magnum www_authenticate_uri = https://dev-openstack.stfc.ac.uk:5000 [magnum_client] endpoint_type = public region_name = RegionOne [neutron_client] endpoint_type = public region_name = RegionOne [nova_client] endpoint_type = public region_name = RegionOne [octavia_client] endpoint_type = public region_name = RegionOne [oslo_concurrency] lock_path = /var/lib/magnum/tmp [oslo_middleware] enable_proxy_headers_parsing = true [oslo_messaging_notifications] driver = messaging [trust] trustee_domain_admin_name = magnum_domain_admin trustee_domain_admin_password = MAGNUM_DOMAIN_ADMIN_PASS trustee_domain_name = magnum trustee_keystone_interface = public Any suggestions on where to look to set the client version it is complaining about would be much appreciated? Thanks Alex Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From romain.chanu at univ-lyon1.fr Tue Jul 21 13:45:18 2020 From: romain.chanu at univ-lyon1.fr (CHANU ROMAIN) Date: Tue, 21 Jul 2020 13:45:18 +0000 Subject: CentOS unrecoverable after Ceph issues In-Reply-To: <344ac601-6896-8fee-d1f9-98e7ea93e801@civo.com> References: <344ac601-6896-8fee-d1f9-98e7ea93e801@civo.com> Message-ID: <02f5e9fb1d6df3179e7ae856df7326c8a8499cb3.camel@univ-lyon1.fr> Hello, I do not use CentOS and XFS but I had a simillar issue after an outrage. Ceph didnt release the lock on rados block device. You can check if you are facing the same issue than I did. You have to shutdown your instance then type this command: rbd -p your-pool-name lock list instance-volume-id The command should not return any output if your instance is shut. If you got an output about 1 exclusive lock just remove it: rbd -p your-pool-name lock remove instance-volume-id Best Regards,Romain On Tue, 2020-07-21 at 14:04 +0100, Grant Morley wrote: > Hi all, > We recently had an issue with our ceph cluster which ended up > going into "Error" status after some drive failures. The system > stopped allowing writes for a while whilst it recovered. The > ceph > cluster is healthy again but we seem to have a few instances > that > have corrupt filesystems on them. They are all CentOS 7 > instances. > We have got them into rescue mode to try and repair the FS with > "xfs_repair -L" However when we do that we get this: > 973.026283] > XFS (vdb1): Mounting V5 Filesystem > > [ 973.203261] blk_update_request: I/O error, dev vdb, sector > 8389693 > > [ 973.204746] blk_update_request: I/O error, dev vdb, sector > 8390717 > > [ 973.206136] blk_update_request: I/O error, dev vdb, sector > 8391741 > > [ 973.207608] blk_update_request: I/O error, dev vdb, sector > 8392765 > > [ 973.209544] XFS (vdb1): xfs_do_force_shutdown(0x1) called > from > line 1236 of file fs/xfs/xfs_buf.c. Return address = > 0xffffffffc017a50c > > [ 973.212137] XFS (vdb1): I/O Error Detected. Shutting down > filesystem > > [ 973.213429] XFS (vdb1): Please umount the filesystem and > rectify the problem(s) > > [ 973.215036] XFS (vdb1): metadata I/O error: block 0x7ffc3d > ("xlog_bwrite") error 5 numblks 8192 > > [ 973.217201] XFS (vdb1): failed to locate log tail > > [ 973.218239] XFS (vdb1): log mount/recovery failed: error -5 > > [ 973.219865] XFS (vdb1): log mount failed > > [ 973.233792] blk_update_request: I/O error, dev vdb, sector > 0 > Interestingly > any debian based instances we could recover. It just seems to > be > CentOS and having XFS on CentOS and ceph the instances don't > seem happy. This seems more low level to me in ceph rather > than > a corrupt FS on a guest. > Does anyone > know of any "ceph tricks" that we can use to try and at least > get an "xfs_repair" running? > Many thanks, > > > > -- > > > Grant Morley > > Cloud Lead, Civo Ltd > > www.civo.com > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4843 bytes Desc: not available URL: From grant at civo.com Tue Jul 21 14:39:26 2020 From: grant at civo.com (Grant Morley) Date: Tue, 21 Jul 2020 15:39:26 +0100 Subject: CentOS unrecoverable after Ceph issues In-Reply-To: <02f5e9fb1d6df3179e7ae856df7326c8a8499cb3.camel@univ-lyon1.fr> References: <344ac601-6896-8fee-d1f9-98e7ea93e801@civo.com> <02f5e9fb1d6df3179e7ae856df7326c8a8499cb3.camel@univ-lyon1.fr> Message-ID: <0c1679b5-dd1a-0682-541e-cf0e842df619@civo.com> Hi, That has done the trick! Thank you so much for your help. Regards, Grant On 21/07/2020 14:45, CHANU ROMAIN wrote: > Hello, > > I do not use CentOS and XFS but I had a simillar issue after an > outrage. Ceph didnt release the lock on rados block device. You can > check if you are facing the same issue than I did. You have to > shutdown your instance then type this command: > > rbd -p your-pool-name lock list instance-volume-id > > The command should not return any output if your instance is shut. If > you got an output about 1 exclusive lock just remove it: > > rbd -p your-pool-name lock remove instance-volume-id > > Best Regards, > Romain > > On Tue, 2020-07-21 at 14:04 +0100, Grant Morley wrote: >> >> Hi all, >> >> We recently had an issue with our ceph cluster which ended up going >> into "Error" status after some drive failures. The system stopped >> allowing writes for a while whilst it recovered. The ceph cluster is >> healthy again but we seem to have a few instances that have corrupt >> filesystems on them. They are all CentOS 7 instances. We have got >> them into rescue mode to try and repair the FS with "xfs_repair -L" >> However when we do that we get this: >> >> 973.026283] XFS (vdb1): Mounting V5 Filesystem >> [ 973.203261] blk_update_request: I/O error, dev vdb, sector 8389693 >> [ 973.204746] blk_update_request: I/O error, dev vdb, sector 8390717 >> [ 973.206136] blk_update_request: I/O error, dev vdb, sector 8391741 >> [ 973.207608] blk_update_request: I/O error, dev vdb, sector 8392765 >> [ 973.209544] XFS (vdb1): xfs_do_force_shutdown(0x1) called from line >> 1236 of file fs/xfs/xfs_buf.c. Return address = 0xffffffffc017a50c >> [ 973.212137] XFS (vdb1): I/O Error Detected. Shutting down filesystem >> [ 973.213429] XFS (vdb1): Please umount the filesystem and rectify >> the problem(s) >> [ 973.215036] XFS (vdb1): metadata I/O error: block 0x7ffc3d >> ("xlog_bwrite") error 5 numblks 8192 >> [ 973.217201] XFS (vdb1): failed to locate log tail >> [ 973.218239] XFS (vdb1): log mount/recovery failed: error -5 >> [ 973.219865] XFS (vdb1): log mount failed >> [ 973.233792] blk_update_request: I/O error, dev vdb, sector 0 >> >> Interestingly any debian based instances we could recover. It just >> seems to be CentOS and having XFS on CentOS and ceph the instances >> don't seem happy. This seems more low level to me in ceph rather than >> a corrupt FS on a guest. >> >> Does anyone know of any "ceph tricks" that we can use to try and at >> least get an "xfs_repair" running? >> >> Many thanks, >> >> >> -- >> Grant Morley >> Cloud Lead, Civo Ltd >> www.civo.com -- Grant Morley Cloud Lead, Civo Ltd www.civo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Jul 21 14:43:16 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 21 Jul 2020 07:43:16 -0700 Subject: [all][TC] New Office Hours Times In-Reply-To: References: Message-ID: Hello! We are still missing three TC members' responses (Belimiro, Rico, and Graham) and anyone else from the community interested in office hours! -Kendall (diablo_rojo) On Tue, Jul 7, 2020 at 2:13 PM Kendall Nelson wrote: > Hello! > > I wanted to push this to the top of people's inboxes again. It looks like > we are still missing several TC member's responses, and I would love some > more community response as well since the office hours are FOR you! > > Please take a few min to fill out the survey for new office hours times[1]. > > -Kendall (diablo_rojo) > > [1] https://doodle.com/poll/q27t8pucq7b8xbme > > On Thu, Jul 2, 2020 at 2:52 PM Kendall Nelson > wrote: > >> Hello! >> >> It's been a while since the office hours had been refreshed and we have a >> lot of new people on the TC that were not around when the times were set. >> >> In an effort to stir things up a bit, and get more community engagement, >> we are picking new times! >> >> I want to invite everyone in the community interested in interacting more >> with the TC to respond to the poll so we have your input as the office >> hours are really for your benefit anyway. (Nevermind the name of the poll >> :) Too much work to remake the whole thing just to rename it..) >> >> That said, we do need responses from ALL TC members so that we can also >> document who will (typically) be present for each office hour as well. >> >> (Also, thanks Mohammed for putting the poll together! It's no joke. ) >> >> -Kendall (diablo_rojo) >> >> [1] https://doodle.com/poll/q27t8pucq7b8xbme >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Tue Jul 21 15:12:57 2020 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Tue, 21 Jul 2020 23:12:57 +0800 Subject: [all][TC] New Office Hours Times In-Reply-To: References: Message-ID: Thanks Kendall for the reminding, just send out my response On Tue, Jul 21, 2020 at 10:49 PM Kendall Nelson wrote: > Hello! > > We are still missing three TC members' responses (Belimiro, Rico, and > Graham) and anyone else from the community interested in office hours! > > -Kendall (diablo_rojo) > > On Tue, Jul 7, 2020 at 2:13 PM Kendall Nelson > wrote: > >> Hello! >> >> I wanted to push this to the top of people's inboxes again. It looks like >> we are still missing several TC member's responses, and I would love some >> more community response as well since the office hours are FOR you! >> >> Please take a few min to fill out the survey for new office hours >> times[1]. >> >> -Kendall (diablo_rojo) >> >> [1] https://doodle.com/poll/q27t8pucq7b8xbme >> >> On Thu, Jul 2, 2020 at 2:52 PM Kendall Nelson >> wrote: >> >>> Hello! >>> >>> It's been a while since the office hours had been refreshed and we have >>> a lot of new people on the TC that were not around when the times were set. >>> >>> In an effort to stir things up a bit, and get more community engagement, >>> we are picking new times! >>> >>> I want to invite everyone in the community interested in interacting >>> more with the TC to respond to the poll so we have your input as the office >>> hours are really for your benefit anyway. (Nevermind the name of the poll >>> :) Too much work to remake the whole thing just to rename it..) >>> >>> That said, we do need responses from ALL TC members so that we can also >>> document who will (typically) be present for each office hour as well. >>> >>> (Also, thanks Mohammed for putting the poll together! It's no joke. ) >>> >>> -Kendall (diablo_rojo) >>> >>> [1] https://doodle.com/poll/q27t8pucq7b8xbme >>> >> -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Tue Jul 21 16:30:19 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 21 Jul 2020 09:30:19 -0700 Subject: [infra] CentOS support for mirror role in system-config In-Reply-To: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> References: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> Message-ID: <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> On Tue, Jul 21, 2020, at 1:25 AM, Javier Pena wrote: > Hi all, > > TL;DR: I have proposed a set of changes to add CentOS support to the > mirror role in system-config with [1] and would appreciate reviews. > > Long version: the RDO project maintains a set of mirrors that mimic > those provided by the OpenDev Infra team to jobs running in > review.opendev.org. The reason for this is to provide the same > environment for TripleO jobs to run on both OpenDev's and RDO's Gerrit > platforms. > > Previously, we used the Puppet modules from the system-config repo, > together with some unmerged changes to move that support to > puppet-openstackci, as it was suggested during the review process [2]. > Once those modules were obsoleted, we have proposed a set of changes to > the mirror ansible role [1] to add that support. > > I would appreciate reviews on those changes (thanks Ian for the first > reviews!). Some of them are small bugfixes to fix the already existing > CentOS support, while [3] is the one targeting the mirror role. One thing we've done as part of the shift from puppet to Ansible is intentionally "hide" these implementation details. Specifically all of the roles under system-config/playbooks/roles aren't really intended to be consumed externally. We've done this because back with the Puppet modules we invested extra effort to make the modules re-consumable, but then never got much help in making that viable. With the shift to Ansible we've taken that as an opportunity to make it clear we don't really intend to make these roles re-consumable (that is why they are in playbooks/roles). This has allowed us to reduce the number of platforms we care about as well as make changes assuming we are the only users. One specific concern along these lines is we've added https support to the mirrors. Our Xenial jobs are the last remaining place where https support isn't always available; once Xenial jobs are retired we'd like to force https. Doing that may break downstream users if they have consumers of their mirrors that cannot do https. This case may not apply to RDO, but I'm sure there are others that would. That said, the use case you describe is a reasonable one. I think several of the changes are relatively minor and don't present major concern (particularly those to roles/ not playbooks/roles/), and we have much better ability to test things now. I'm not sure what the best option is at this point. I'd like to selfishly retain the simplicity we've gained through the switch to Ansible. Would it make sense for RDO to use a copy of the role where centos support can be added? I guess the issue with this is the role has several other dependencies and isn't necessarily usable in isolation. Would RDO expect us to coordinate upstream changes to the mirrors with them? Curious to hear what others think. > > Thanks, > Javier > > [1] - > https://review.opendev.org/#/q/status:open+project:opendev/system-config+branch:master+topic:mirror-centos > [2] - > https://review.opendev.org/#/q/status:open+project:opendev/puppet-openstackci+branch:master+topic:afs-mirror-centos > [3] - https://review.opendev.org/736996 > > > From knikolla at bu.edu Tue Jul 21 16:55:05 2020 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 21 Jul 2020 16:55:05 +0000 Subject: [all][tc][glance] Meeting about Glance and WSGI support Message-ID: We're trying to organize a meeting between the TC, Glance team and other interested people with regard to Glance's merging (and backporting) the removal of support from the documentation about the WSGI mode of deployment (running Glance under HTTPD/uWSGI) Objective of the meeting is to find agreement about the path forward. See Etherpad for more context and captured notes. [0] The proposed time is this Thursday, July 23rd at 1500 UTC during the regularly scheduled TC Office Hours. [1] If that time doesn't work for you, and you'd like to participate in the meeting please respond to this email and we can work in finding a more appropriate time. Meeting will be over conference call, unless there are objections. Discussion will be captured in the same Etherpad. [0]. https://etherpad.opendev.org/p/tc-glance-wsgi [1]. http://eavesdrop.openstack.org/#Technical_Committee_Office_hours From mnaser at vexxhost.com Tue Jul 21 20:40:59 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 21 Jul 2020 16:40:59 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - V goals, Zuul v3 migration: update links and grenade https://review.opendev.org/#/c/741987/ - [manila] assert:supports-accessible-upgrade https://review.opendev.org/740509 - migrate testing to ubuntu focal https://review.opendev.org/740851 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Add legacy repository validation https://review.opendev.org/737559 [Updated 12 days ago] - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 [Updated 13 days ago] - [draft] Add assert:supports-standalone https://review.opendev.org/722399 [Updated 27 days ago] # Email Threads - New Office Hours: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015761.html - Summit CFP Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015730.html # Other Reminders - If you're an operator, make sure you fill out our user survey: https://www.openstack.org/user-survey/survey-2020/ - Milestone 2 coming at the end of the month Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From feilong at catalyst.net.nz Tue Jul 21 21:42:19 2020 From: feilong at catalyst.net.nz (Feilong Wang) Date: Wed, 22 Jul 2020 09:42:19 +1200 Subject: Magnum: invalid format of client version In-Reply-To: References: Message-ID: <8a6f12e3-4c47-91aa-5d8d-96c5131ac534@catalyst.net.nz> Hi Alex, Could you please let me know what's the cluster image you're using Fedora Atomic or Fedora CoreOS? And can you pls show me your cluster template? On 22/07/20 12:53 am, Alexander Dibbo - UKRI STFC wrote: > > Hi, > > I have just deployed magnum into my train enviroment and am seeing the > following error when creating any kind of cluster: > > This is a Train environment deployed from RDO packages (9.4.0-1). > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server > [req-aa9ce18b-64eb-40ad-b1c0-b7c312402780 - - - - -] Exception during > message handling: InvalidParameterValue: ERROR: UnsupportedVersion: : > resources.worker_nodes_server_group: : Invalid format of client > version ''. Expected format 'X.Y', where X is a major part and Y is a > minor part of version. > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server Traceback (most > recent call last): > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server   File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line > 165, in _process_incoming > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server     res = > self.dispatcher.dispatch(message) > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server   File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 274, in dispatch > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server     return > self._do_dispatch(endpoint, method, ctxt, args) > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server   File > "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 194, in _do_dispatch > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server     result = > func(ctxt, **new_args) > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server   File > "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 160, > in wrapper > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server     result = > f(*args, **kwargs) > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server   File > "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", > line 95, in cluster_create > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server     raise e > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server > InvalidParameterValue: ERROR: UnsupportedVersion: : > resources.worker_nodes_server_group: : Invalid format of client > version ''. Expected format 'X.Y', where X is a major part and Y is a > minor part of version. > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 > 14:08:24.942 6251 ERROR oslo_messaging.rpc.server #033[00m > > When logging in debug, I see a dump of a huge heat template and a 400 > bad request from heatclient (as below) immediately before the above > log excerpt: > > {"explanation": "The server could not comply with the request since it > is either malformed or otherwise incorrect.", "code": 400, "error": > {"message": "UnsupportedVersion: : > resources.master_nodes_server_group: : Invalid format of client > version ''. Expected format 'X.Y', where X is a major part and Y is a > minor part of version.", "traceback": null, "type": > "StackValidationFailed"}, "title": "Bad Request"} > > log_http_response > /usr/lib/python2.7/site-packages/heatclient/common/http.py:157 > >   > > More details are available in my question here: > https://ask.openstack.org/en/question/128520/magnum-invalid-format-of-client-version/ > >   > > Any suggestions on where to look to set the client version it is > complaining about would be much appreciated? > > Thanks > > Alex > >   > >   > > Regards > >   > > Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader > > For STFC Cloud Documentation visit > https://stfc-cloud-docs.readthedocs.io > > > To raise a support ticket with the cloud team please email > cloud-support at gridpp.rl.ac.uk > > To receive notifications about the service please subscribe to our > mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD > > To receive fast notifications or to discuss usage of the cloud please > join our Slack: https://stfc-cloud.slack.com/ > >   > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not > use, disclose, copy or distribute this email or any of its attachments > and should notify the sender immediately and delete this email from > your system. UK Research and Innovation (UKRI) has taken every > reasonable precaution to minimise risk of this email or any > attachments containing viruses or malware but the recipient should > carry out its own virus and malware checks before opening the > attachments. UKRI does not accept any liability for any losses or > damages which the recipient may sustain due to presence of any > viruses. Opinions, conclusions or other information in this message > and attachments that are not related directly to UKRI business are > solely those of the author and do not represent the views of UKRI. > -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From gagehugo at gmail.com Tue Jul 21 21:59:50 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Tue, 21 Jul 2020 16:59:50 -0500 Subject: [security] Security SIG Meeting July 23rd Cancelled Message-ID: Hello, The security SIG meeting for this week is cancelled, we plan on meeting at the usual time next week. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Wed Jul 22 00:03:30 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Tue, 21 Jul 2020 20:03:30 -0400 Subject: [Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found Message-ID: Hi! We are currently troubleshooting a Heat stack issue where one of the stack (one of 25 or so) is failing to be created properly (seemingly randomly). The actual error returned by Heat is quite strange and Google has been quite sparse in terms of references. The actual error looks like the following (I've sanitized some of the names): Resource CREATE failed: resources.potato: Resource CREATE failed: resources[0]: raw template with id 22273 not found heat resource-list STACK_NAME_HERE -n 50 > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | resource_name | physical_resource_id | resource_type > | resource_status | updated_time | stack_name > | > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | > CREATE_FAILED | 2020-07-18 T19:52:10Z | nested_stack_1_STACK_NAME_HERE > | > | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | > CREATE_COMPLETE | 2020-07-21T19:52:10Z | nested_stack_1_STACK_NAME_HERE > | > | 0 | | potato1.yaml > | CREATE_FAILED | 2020-07-18T19:52:12Z | nested_stack_2_STACK_NAME_HERE > | > | 1 | | potato1.yaml > | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | > nested_stack_2_STACK_NAME_HERE | > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > The template itself is pretty simple and attempts to create a ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that one the creation of those machines fails and Heat get's a little cooky and returns an error that might not be the actual root cause. I would have expected the VM to show up in the resource list but I just see the source "yaml". Has anyone seen something similar in the past? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From coolsvap at gmail.com Wed Jul 22 05:36:05 2020 From: coolsvap at gmail.com (=?UTF-8?B?yoLKjcmSz4HGnsSvxYIg0p7GsMi0xLfJksqByonJqA==?=) Date: Wed, 22 Jul 2020 11:06:05 +0530 Subject: [all] PyCharm Community licences Message-ID: I have renewed the Pycharm licenses for community contributors until July 20, 2021. For everyone who is using it will be updated automatically. Please do not request again for renewal. If you are an active contributor and need one, please submit the details on [1] [1] https://docs.google.com/forms/d/e/1FAIpQLSe5JMbtZEKB95AMVnyOBh4-7Y55hDgQChjg5Ed3auO74Tt2fQ/viewform Best Regards, Swapnil Best Regards, Swapnil Kulkarni irc : coolsvap coolsvap at gmail dot com +91-87960 10622(c) http://in.linkedin.com/in/coolsvap From e0ne at e0ne.info Wed Jul 22 07:26:07 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Wed, 22 Jul 2020 10:26:07 +0300 Subject: [horizon] Victoria virtual mid-cycle poll Message-ID: Hi team, As discussed at Horizon's Virtual PTG [1], we'll have a virtual mid-cycle meeting around Victoria-2 milestone. We'll discuss Horizon current cycle development priorities and the future of Horizon with modern JS frameworks. Please indicate your availability to meet for the first session, which will be held during the week of July 27-31: https://doodle.com/poll/3neps94amcreaw8q Please respond before 12:00 UTC on Tuesday 4 August. [1] https://etherpad.opendev.org/p/horizon-v-ptg Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From reza.b2008 at gmail.com Wed Jul 22 07:44:44 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Wed, 22 Jul 2020 12:14:44 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: The problem was due to there were no repositories in the generated image. I solved the problem with virt-customize, and the overcloud deployment completed successfully. But I think it's not a good and clean procedure. Thanks Yatin, I'll try the latest repo too. On Tue, 21 Jul 2020 at 12:46, Yatin Karel wrote: > Hi, > > On Sun, Jul 19, 2020 at 12:41 PM Reza Bakhshayeshi > wrote: > > > > As Ruslanas guided, the problem was solved by disabling gpgcheck. For me > there was no need of enabling HA repos. > > I think this process should be reported as a bug. > > > > Unfortunately, now my overcloud installation fails with: > > > > ... > > TASK [tripleo_podman : ensure podman and deps are installed] > ******************* > > task path: > /usr/share/ansible/roles/tripleo_podman/tasks/tripleo_podman_install.yml:21 > > Saturday 18 July 2020 15:04:29 +0430 (0:00:00.193) 0:04:37.581 > ********* > > Running dnf > > Using module file > /usr/lib/python3.6/site-packages/ansible/modules/packaging/os/dnf.py > > ... > > fatal: [overcloud-controller-0]: FAILED! => changed=false > > failures: > > - No package buildah available. > > invocation: > > module_args: > > allow_downgrade: false > > autoremove: false > > bugfix: false > > conf_file: null > > disable_excludes: null > > disable_gpg_check: false > > disable_plugin: [] > > disablerepo: [] > > download_dir: null > > download_only: false > > enable_plugin: [] > > enablerepo: [] > > exclude: [] > > install_repoquery: true > > install_weak_deps: true > > installroot: / > > list: null > > lock_timeout: 30 > > name: > > - podman > > - buildah > > releasever: null > > security: false > > skip_broken: false > > state: latest > > update_cache: false > > update_only: false > > validate_certs: true > > msg: Failed to install some of the specified packages > > rc: 1 > > results: [] > > ... > > > > Do you think the above error is something related to repos? > > The issue can happen when repos are not configured on overcloud nodes, > but in this particular case buildah is not needed on overcloud nodes, > which is fixed already[1], can u try again with latest repos. > > > [1] > https://review.opendev.org/#/q/Ibb91dfa9684b481dea34607fc47c0d531d56ee45 > > > > On Tue, 14 Jul 2020 at 18:20, Ruslanas Gžibovskis > wrote: > >> > >> I am not sure, but that might help. I use these steps for deployment: > >> > >> cp -ar /etc/yum.repos.d repos > >> sed -i s/gpgcheck=1/gpgcheck=0/g repos/*repo > >> export DIB_YUM_REPO_CONF="$(ls /home/stack/repos/*repo)" > >> export STABLE_RELEASE="ussuri" > >> export > OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml" > >> source /home/stack/stackrc > >> mkdir /home/stack/images > >> cd /home/stack/images > >> openstack overcloud image build --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml > --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > && openstack overcloud image upload --update-existing > >> cd /home/stack > >> ls /home/stack/images > >> > >> this works for all packages except: > >> > >> pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker > crudini openstack-selinux pacemaker pcs > >> > >> to solve these you need to enable in repos dir HA repo (change in > enable=0 to enable=1 > >> and then this will solve you issues with most except: > osops-tools-monitoring-oschecks > >> > >> this one, you can change by: > >> modify line in file: > >> /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map > >> to have this line: > >> "oschecks_package": "sysstat" > >> instead of "oschecks_package": "osops-tools-monitoring-oschecks > >> > >> " > >> > >> > >> > >> > >> On Tue, 14 Jul 2020 at 15:14, Alex Schultz wrote: > >>> > >>> On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi < > reza.b2008 at gmail.com> wrote: > >>> > > >>> > Thanks for your information. > >>> > Actually, I was in doubt of using Ussuri (latest version) for my > environment. > >>> > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but > overcloud image build got some error: > >>> > > >>> > $ openstack overcloud image build --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml > --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > >>> > > >>> > ... > >>> > 2020-07-14 12:14:22.714 | Running install-packages install. > >>> > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient > python3-barbicanclient python3-cinderclient python3-designateclient > python3-glanceclient python3-gnocchiclient python3-heatclient > python3-ironicclient python3-keystoneclient python3-manilaclient > python3-mistralclient python3-neutronclient python3-novaclient > python3-openstackclient python3-pankoclient python3-saharaclient > python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony > pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning > osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman > libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch > openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt > ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs > >>> > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, > config-manager, copr, debug, debuginfo-install, download, > generate_completion_cache, needs-restarting, playground, repoclosure, > repodiff, repograph, repomanage, reposync > >>> > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 > >>> > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum > >>> > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS > Linux 8; generic; Linux.x86_64)' > >>> > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream > >>> > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 > 23:25:16 2020. > >>> > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS > >>> > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 > 23:25:12 2020. > >>> > 2020-07-14 12:14:23.517 | repo: using cache for: extras > >>> > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 > 00:15:26 2020. > >>> > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 > ago on Tue Jul 14 11:43:38 2020. > >>> > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion > cache... > >>> > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient > >>> > 2020-07-14 12:14:23.854 | No match for argument: > python3-barbicanclient > >>> > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient > >>> > 2020-07-14 12:14:23.862 | No match for argument: > python3-designateclient > >>> > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient > >>> > 2020-07-14 12:14:23.869 | No match for argument: > python3-gnocchiclient > >>> > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient > >>> > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient > >>> > 2020-07-14 12:14:23.880 | No match for argument: > python3-keystoneclient > >>> > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient > >>> > 2020-07-14 12:14:23.887 | No match for argument: > python3-mistralclient > >>> > 2020-07-14 12:14:23.891 | No match for argument: > python3-neutronclient > >>> > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient > >>> > 2020-07-14 12:14:23.898 | No match for argument: > python3-openstackclient > >>> > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient > >>> > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient > >>> > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient > >>> > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient > >>> > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is > already installed. > >>> > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already > installed. > >>> > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote > >>> > 2020-07-14 12:14:23.929 | No match for argument: > osops-tools-monitoring-oschecks > >>> > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker > >>> > 2020-07-14 12:14:23.936 | No match for argument: crudini > >>> > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux > >>> > 2020-07-14 12:14:23.953 | No match for argument: pacemaker > >>> > 2020-07-14 12:14:23.957 | No match for argument: pcs > >>> > 2020-07-14 12:14:23.961 | Error: Unable to find a match: > python3-aodhclient python3-barbicanclient python3-cinderclient > python3-designateclient python3-glanceclient python3-gnocchiclient > python3-heatclient python3-ironicclient python3-keystoneclient > python3-manilaclient python3-mistralclient python3-neutronclient > python3-novaclient python3-openstackclient python3-pankoclient > python3-saharaclient python3-swiftclient python3-zaqarclient > pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini > openstack-selinux pacemaker pcs > >>> > > >>> > Do you have any idea? > >>> > > >>> > >>> Seems like you are missing the correct DIP_YUM_REPO_CONF setting per > >>> #3 from > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images > >>> > >>> > > >>> > > >>> > On Mon, 13 Jul 2020 at 10:50, Marios Andreou > wrote: > >>> >> > >>> >> Hi folks, > >>> >> > >>> >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz > wrote: > >>> >>> > >>> >>> I don't believe centos8 containers are available for Train yet. The > >>> >>> error you're hitting is because it's fetching centos7 containers > and > >>> >>> the ironic container is not backwards compatible between the two > >>> >>> versions. If you want centos8, use Ussuri. > >>> >>> > >>> >> > >>> >> fyi we started pushing centos8 train last week - slightly different > namespace - latest current-tripleo containers are pushed to > https://hub.docker.com/u/tripleotraincentos8 > >>> >> > >>> >> hope it helps > >>> >> > >>> >>> > >>> >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi < > reza.b2008 at gmail.com> wrote: > >>> >>> > > >>> >>> > I found following error in ironic and container-puppet-ironic > container log during installation: > >>> >>> > > >>> >>> > puppet-user: Error: > /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: > Could not evaluate: Could not retrieve information from environment > production source(s) file:/tftpboot/ldlinux.c32 > >>> >>> > > >>> >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi < > reza.b2008 at gmail.com> wrote: > >>> >>> >> > >>> >>> >> Hi, > >>> >>> >> > >>> >>> >> I'm going to install OpenStack Train with the help of TripleO > on CentOS 8, but undercloud installation fails with the following error: > >>> >>> >> > >>> >>> >> "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ > '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit > 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying > running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed > running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- > Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 > ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: > 95117 -- ERROR configuring zaqar"]} > >>> >>> >> > >>> >>> >> Any suggestion would be grateful. > >>> >>> >> Regards, > >>> >>> >> Reza > >>> >>> >> > >>> >>> >> > >>> >>> > >>> >>> > >>> > >>> > >> > >> > >> -- > >> Ruslanas Gžibovskis > >> +370 6030 7030 > > > Thanks and Regards > Yatin Karel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Jul 22 09:59:57 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 22 Jul 2020 11:59:57 +0200 Subject: [largescale-sig] Next meeting: July 22, 8utc In-Reply-To: <68f03b6d-9481-79d0-ae05-95de9e2eae48@openstack.org> References: <68f03b6d-9481-79d0-ae05-95de9e2eae48@openstack.org> Message-ID: <168726f7-8141-ddfd-a76d-3d34862fff85@openstack.org> Meeting logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-07-22-08.00.html TODOs: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - ttx to escalate OSops revival thread for osarchiver hosting - ttx to set alternating US-EU / EU-APAC meetings Next meetings: Aug 12, 16:00UTC, Aug 26, 8:00UTC (#openstack-meeting-3) -- Thierry Carrez (ttx) From jpena at redhat.com Wed Jul 22 10:13:39 2020 From: jpena at redhat.com (Javier Pena) Date: Wed, 22 Jul 2020 06:13:39 -0400 (EDT) Subject: [infra] CentOS support for mirror role in system-config In-Reply-To: <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> References: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> Message-ID: <2023005475.42646212.1595412819383.JavaMail.zimbra@redhat.com> Thanks for your reply Clark, some comments inline > On Tue, Jul 21, 2020, at 1:25 AM, Javier Pena wrote: > > Hi all, > > > > TL;DR: I have proposed a set of changes to add CentOS support to the > > mirror role in system-config with [1] and would appreciate reviews. > > > > Long version: the RDO project maintains a set of mirrors that mimic > > those provided by the OpenDev Infra team to jobs running in > > review.opendev.org. The reason for this is to provide the same > > environment for TripleO jobs to run on both OpenDev's and RDO's Gerrit > > platforms. > > > > Previously, we used the Puppet modules from the system-config repo, > > together with some unmerged changes to move that support to > > puppet-openstackci, as it was suggested during the review process [2]. > > Once those modules were obsoleted, we have proposed a set of changes to > > the mirror ansible role [1] to add that support. > > > > I would appreciate reviews on those changes (thanks Ian for the first > > reviews!). Some of them are small bugfixes to fix the already existing > > CentOS support, while [3] is the one targeting the mirror role. > > One thing we've done as part of the shift from puppet to Ansible is > intentionally "hide" these implementation details. Specifically all of the > roles under system-config/playbooks/roles aren't really intended to be > consumed externally. We've done this because back with the Puppet modules we > invested extra effort to make the modules re-consumable, but then never got > much help in making that viable. With the shift to Ansible we've taken that > as an opportunity to make it clear we don't really intend to make these > roles re-consumable (that is why they are in playbooks/roles). This has > allowed us to reduce the number of platforms we care about as well as make > changes assuming we are the only users. > > One specific concern along these lines is we've added https support to the > mirrors. Our Xenial jobs are the last remaining place where https support > isn't always available; once Xenial jobs are retired we'd like to force > https. Doing that may break downstream users if they have consumers of their > mirrors that cannot do https. This case may not apply to RDO, but I'm sure > there are others that would. > In the RDO case, https is a welcome enhancement, actually. The only potential issue would be the hardcoded paths to the SSL certficates, but nothing we cannot work with. > That said, the use case you describe is a reasonable one. I think several of > the changes are relatively minor and don't present major concern > (particularly those to roles/ not playbooks/roles/), and we have much better > ability to test things now. I'm not sure what the best option is at this > point. I'd like to selfishly retain the simplicity we've gained through the > switch to Ansible. > > Would it make sense for RDO to use a copy of the role where centos support > can be added? I guess the issue with this is the role has several other > dependencies and isn't necessarily usable in isolation. > I'd prefer to avoid forking the role if possible, so we can automatically get updates when new mirrors are added (for example, https://review.opendev.org/738942, even if not relevant to our use case). If this copy of the role is actively maintained, though, it would be feasible. About role dependencies, I have been able to make it work with just the openafs-client and kerberos-client roles plus some glue code to provide the SSL certificate. To get CI jobs to pass I also had to update the base role [4], but this could be optional depending on the final setup. > Would RDO expect us to coordinate upstream changes to the mirrors with them? Not really. Most of the issues we have had so far have been related to new mirrors being missing from our config because we had not updated the under-review patches, so I do not expect a tight coordination to be required. Regards, Javier [4] - https://review.opendev.org/737043 > > Curious to hear what others think. > > > > > Thanks, > > Javier > > > > [1] - > > https://review.opendev.org/#/q/status:open+project:opendev/system-config+branch:master+topic:mirror-centos > > [2] - > > https://review.opendev.org/#/q/status:open+project:opendev/puppet-openstackci+branch:master+topic:afs-mirror-centos > > [3] - https://review.opendev.org/736996 > > > > > > > > From mnaser at vexxhost.com Wed Jul 22 13:05:55 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 22 Jul 2020 09:05:55 -0400 Subject: [all][tc][glance] Meeting about Glance and WSGI support In-Reply-To: References: Message-ID: On Tue, Jul 21, 2020 at 1:00 PM Nikolla, Kristi wrote: > > We're trying to organize a meeting between the TC, Glance team and other interested people with regard to Glance's merging (and backporting) the removal of support from the documentation about the WSGI mode of deployment (running Glance under HTTPD/uWSGI) > > Objective of the meeting is to find agreement about the path forward. > > See Etherpad for more context and captured notes. [0] > > The proposed time is this Thursday, July 23rd at 1500 UTC during the regularly scheduled TC Office Hours. [1] > > If that time doesn't work for you, and you'd like to participate in the meeting please respond to this email and we can work in finding a more appropriate time. I've also added an 'attendees' section, so please add yourself there as an attendee if you plan on attending, just to know who we're expecting. > Meeting will be over conference call, unless there are objections. Discussion will be captured in the same Etherpad. > > [0]. https://etherpad.opendev.org/p/tc-glance-wsgi > [1]. http://eavesdrop.openstack.org/#Technical_Committee_Office_hours > > > -- Mohammed Naser VEXXHOST, Inc. From alexander.dibbo at stfc.ac.uk Wed Jul 22 07:05:48 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Wed, 22 Jul 2020 07:05:48 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <8a6f12e3-4c47-91aa-5d8d-96c5131ac534@catalyst.net.nz> References: <8a6f12e3-4c47-91aa-5d8d-96c5131ac534@catalyst.net.nz> Message-ID: Hi, We are using fedora atomic latest from Fedora-Atomic-27-20180419.0.x86_64.qcow2 +------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | checksum | a8c45037711181872809eb13431402e6 | | container_format | bare | | created_at | 2019-09-09T12:37:41Z | | disk_format | qcow2 | | file | /v2/images/722db87f-10e2-4656-b7b8-0b78c2ef5aa8/file | | id | 722db87f-10e2-4656-b7b8-0b78c2ef5aa8 | | min_disk | 0 | | min_ram | 0 | | name | fedora-atomic-latest | | owner | c9aee696c4b54f12a645af2c951327dc | | properties | direct_url='rbd://55c25f6d-9e06-4a6a-8202-b087073ea8d6/cloud-dev/722db87f-10e2-4656-b7b8-0b78c2ef5aa8/snap', locations='[{u'url': u'rbd://55c25f6d-9e06-4a6a-8202-b087073ea8d6/cloud-dev/722db87f-10e2-4656-b7b8-0b78c2ef5aa8/snap', u'metadata': {}}]', os_distro='fedora-atomic', os_hash_algo=, os_hash_value=, os_hidden='False' | | protected | False | | schema | /v2/schemas/image | | size | 642286592 | | status | active | | tags | | | updated_at | 2019-09-09T12:37:54Z | | virtual_size | None | | visibility | private | +------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ The cluster template looks like this: +-----------------------+--------------------------------------+ | Field | Value | +-----------------------+--------------------------------------+ | insecure_registry | - | | labels | {} | | updated_at | - | | floating_ip_enabled | True | | fixed_subnet | - | | master_flavor_id | m1.small | | uuid | a81fe72b-e699-44b8-a7d0-cfc7525a42fd | | no_proxy | - | | https_proxy | - | | tls_disabled | False | | keypair_id | - | | public | False | | http_proxy | - | | docker_volume_size | 3 | | server_type | vm | | external_network_id | External | | cluster_distro | fedora-atomic | | image_id | fedora-atomic-latest | | volume_driver | - | | registry_enabled | False | | docker_storage_driver | devicemapper | | apiserver_port | - | | name | kubernetes-cluster-template | | created_at | 2020-07-21T09:25:26+00:00 | | network_driver | flannel | | fixed_network | - | | coe | kubernetes | | flavor_id | m1.small | | master_lb_enabled | False | | dns_nameserver | 8.8.8.8 | | hidden | False | +-----------------------+--------------------------------------+ Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Feilong Wang Sent: 21 July 2020 22:42 To: openstack-discuss at lists.openstack.org Subject: Re: Magnum: invalid format of client version Hi Alex, Could you please let me know what's the cluster image you're using Fedora Atomic or Fedora CoreOS? And can you pls show me your cluster template? On 22/07/20 12:53 am, Alexander Dibbo - UKRI STFC wrote: Hi, I have just deployed magnum into my train enviroment and am seeing the following error when creating any kind of cluster: This is a Train environment deployed from RDO packages (9.4.0-1). Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server [req-aa9ce18b-64eb-40ad-b1c0-b7c312402780 - - - - -] Exception during message handling: InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server Traceback (most recent call last): Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 160, in wrapper Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 95, in cluster_create Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server raise e Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server #033[00m When logging in debug, I see a dump of a huge heat template and a 400 bad request from heatclient (as below) immediately before the above log excerpt: {"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "UnsupportedVersion: : resources.master_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /usr/lib/python2.7/site-packages/heatclient/common/http.py:157 More details are available in my question here: https://ask.openstack.org/en/question/128520/magnum-invalid-format-of-client-version/ Any suggestions on where to look to set the client version it is complaining about would be much appreciated? Thanks Alex Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -- Cheers & Best regards, Feilong Wang (王飞龙) Head of R&D Catalyst Cloud - Cloud Native New Zealand -------------------------------------------------------------------------- Tel: +64-48032246 Email: flwang at catalyst.net.nz Level 6, Catalyst House, 150 Willis Street, Wellington -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Wed Jul 22 14:03:56 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Wed, 22 Jul 2020 15:03:56 +0100 Subject: Magnum: invalid format of client version In-Reply-To: References: Message-ID: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> Hi Alex Looks like the following errors are being emitted from Nova via Heat. Would you mind ensuring that you are able to spin up servers using nova CLI and then creating a simple heat stack via heat CLI. Cheers Bharat > On 21 Jul 2020, at 13:53, Alexander Dibbo - UKRI STFC wrote: > > Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Wed Jul 22 14:04:21 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 22 Jul 2020 16:04:21 +0200 Subject: [tripleo][centos8][ussuri][horizon] horizon container fails to start In-Reply-To: References: Message-ID: Hi all, Is there any way, how I could modify [0] so tripleO would not download again my modified image? Or could it be fixed? Should I raise bug report? is it in the launchpad again? Should it be for Horizon or for TripleO or Kolla? And yes, i understand that it tries to use python exec instead of python3 exec ;) but I am completely new in containerized setup, how to modify it "by hand" so TripleO deployment would not fix it back, and either way, if might be not working for more people, or it works for others?! And only me who is facing this issue? [0] https://github.com/openstack/kolla/blob/12905b5fc18c93fdece91df9a2446771d10dfbad/docker/horizon/extend_start.sh#L18 ? On Thu, 16 Jul 2020 at 10:38, Ruslanas Gžibovskis wrote: > Hi all, > > I have noticed, that horizon container fails to start and some > interestin zen_wozniak has apeared [0]. > Healthcheck log is empty, but horizon log [1] sais "/usr/bin/python: No > such file or directory" and there is no such file or directory :) > > after sume update it failed. I believe you guys will push update fast > enough, as I am still bad at this git and container part.... > HOW to fix it now :) on my side? As tripleo will redeploy horizon from > images... and will update image. could you please give me a hint where to > duck tape it whille it will be pushed to prod? > > [0] http://paste.openstack.org/show/3jjnsgXfWRxs3o0G6aKH/ > [1] http://paste.openstack.org/show/1S66A55cz0UaFUWGxID8/ > -- > Ruslanas Gžibovskis > +370 6030 7030 > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.dibbo at stfc.ac.uk Wed Jul 22 14:27:41 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Wed, 22 Jul 2020 14:27:41 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> Message-ID: <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> Hi Bharat, Creating a VM from the CLI works fine, as does the heat stack. In both cases using the same image, flavour and keypair/ Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Bharat Kunwar Sent: 22 July 2020 15:04 To: Dibbo, Alexander (STFC,RAL,SC) Cc: openstack-discuss at lists.openstack.org Subject: Re: Magnum: invalid format of client version Hi Alex Looks like the following errors are being emitted from Nova via Heat. Would you mind ensuring that you are able to spin up servers using nova CLI and then creating a simple heat stack via heat CLI. Cheers Bharat On 21 Jul 2020, at 13:53, Alexander Dibbo - UKRI STFC > wrote: Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From davanep2787 at gmail.com Wed Jul 22 16:13:23 2020 From: davanep2787 at gmail.com (PRAKASH DAVANE) Date: Wed, 22 Jul 2020 21:43:23 +0530 Subject: Need help with BlockDeviceSetupException Message-ID: Hi, I am trying Disk Image Builder and followed initial instructions given in DIB documentation. I tried creating basic ubuntu image with 'disk-image-create ubuntu vm" command. I am executing this on CentOS7 server installation on virtual box machine on my laptop. I am getting following error - 2020-07-22 14:38:38.181 | INFO diskimage_builder.block_device.utils [-] Calling [sudo sync] 2020-07-22 14:38:38.480 | INFO diskimage_builder.block_device.utils [-] Calling [sudo fstrim --verbose /tmp/dib_build.kjzn9JEP/mnt/] 2020-07-22 14:38:38.692 | INFO diskimage_builder.block_device.utils [-] Calling [sudo umount /tmp/dib_build.kjzn9JEP/mnt/] 2020-07-22 14:38:38.969 | INFO diskimage_builder.block_device.utils [-] Calling [sudo kpartx -d /dev/loop0] 2020-07-22 14:38:38.986 | Traceback (most recent call last): 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/bin/dib-block-device", line 8, in 2020-07-22 14:38:38.986 | sys.exit(main()) 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 120, in main 2020-07-22 14:38:38.986 | return bdc.main() 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 115, in main 2020-07-22 14:38:38.986 | self.args.func() 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 39, in cmd_umount 2020-07-22 14:38:38.986 | self.bd.cmd_umount() 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/blockdevice.py", line 442, in cmd_umount 2020-07-22 14:38:38.986 | node.umount() 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/level1/partition.py", line 88, in umount 2020-07-22 14:38:38.986 | self.partitioning.umount() 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/level1/partitioning.py", line 228, in umount 2020-07-22 14:38:38.986 | self.state['blockdev'][self.base]['device']]) 2020-07-22 14:38:38.986 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/utils.py", line 143, in exec_sudo 2020-07-22 14:38:38.986 | raise e 2020-07-22 14:38:38.987 | diskimage_builder.block_device.exception.BlockDeviceSetupException: exec_sudo failed 2020-07-22 14:38:39.974 | INFO diskimage_builder.block_device.level3.mount [-] Called for [mount_mkfs_root] 2020-07-22 14:38:39.974 | INFO diskimage_builder.block_device.utils [-] Calling [sudo sync] 2020-07-22 14:38:40.086 | INFO diskimage_builder.block_device.utils [-] Calling [sudo fstrim --verbose /tmp/dib_build.kjzn9JEP/mnt/] 2020-07-22 14:38:40.101 | Traceback (most recent call last): 2020-07-22 14:38:40.101 | File "/root/dib-virtualenv/bin/dib-block-device", line 8, in 2020-07-22 14:38:40.101 | sys.exit(main()) 2020-07-22 14:38:40.101 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 120, in main 2020-07-22 14:38:40.101 | return bdc.main() 2020-07-22 14:38:40.101 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 115, in main 2020-07-22 14:38:40.101 | self.args.func() 2020-07-22 14:38:40.101 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/cmd.py", line 39, in cmd_umount 2020-07-22 14:38:40.102 | self.bd.cmd_umount() 2020-07-22 14:38:40.102 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/blockdevice.py", line 442, in cmd_umount 2020-07-22 14:38:40.102 | node.umount() 2020-07-22 14:38:40.102 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/level3/mount.py", line 111, in umount 2020-07-22 14:38:40.102 | self.state['mount'][self.mount_point]['path']]) 2020-07-22 14:38:40.102 | File "/root/dib-virtualenv/lib/python2.7/site-packages/diskimage_builder/block_device/utils.py", line 143, in exec_sudo 2020-07-22 14:38:40.102 | raise e 2020-07-22 14:38:40.102 | diskimage_builder.block_device.exception.BlockDeviceSetupException: exec_sudo failed Can you please help me understand issue here? Thanks, Prakash -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Jul 22 19:28:58 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 22 Jul 2020 15:28:58 -0400 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: <8225c61e-687c-0116-da07-52443f315e43@est.tech> References: <8225c61e-687c-0116-da07-52443f315e43@est.tech> Message-ID: <63f0f9f5-0710-bcbd-f8c9-eeea5e5366cb@gmail.com> On 7/9/20 12:27 PM, Előd Illés wrote: > Hi, > > Sorry for sticking my nose into this thread (again o:)), just a couple > of thoughts: Always happy to see your nose :-) > - we had a rough month with failing Devstack and Tempest (and other) > jobs, but thanks to Gmann and others we could fix most of the issues > (except Tempest in Ocata, that's why it is announced generally as > Unmaintained [0]) > - this added some extra time to show a branch as unmaintained > - branches in extended maintenance are not that busy branches, but > still, I see some bugfix backports coming in even in Pike (in spite of > failing gate in the last month) > - Lee announced nova's Unmaintained state in the same circumstances, as > we just fixed Pike's devstack - and I also sent a reply that I will > continue to maintain nova's stable/pike as it is getting in a better > shape now > > Last but not least: in cinder, there are "Zuul +1"d gate fixes both for > Pike [1] (and Queens [2]), so it's not that hopeless. > > I don't want to keep a broken branch open in any cost, but does it cost > that much? I mean, if there is the possibility to push a fix, why don't > we let it happen? Right now Cinder Pike's gate seems working (with the > fix, which needs an approve [1]). We discussed this at the past two Cinder project team meetings, once to think about the idea and again today to make sure there were no second thoughts. I proposed that we would keep Pike open if someone on the cinder stable maintenance team were willing to "adopt" the branch. The silence was deafening. In short, no one on the core team is interested in approving patches for stable/pike, and no one in the wider Cinder project team of active contributors has any objections. > My suggestion is that let Pike still be in Extended Maintenance as it is > still have a working gate ([1]) and EOL Ocata as it was already about to > happen according to the mail thread [0], if necessary. We appreciate your suggestion, but the feeling of the Cinder project team is that we should EOL both Pike and Ocata. > > Also, please check the steps in 'End of Life' chapter of the stable > guideline [3] and let me offer my help if you need it for the transition. I appreciate your offer. I'll have the EOL patches posted shortly. The only thing I'm not sure about is whether there are zuul jobs in other repositories that are not needed any more. I don't think there are, but I may be having a failure of imagination in deciding where to look. > > Cheers, > > Előd > > [0] > http://lists.openstack.org/pipermail/openstack-discuss/2020-May/thread.html#15112 > > [1] https://review.opendev.org/#/c/737094/ > [2] https://review.opendev.org/#/c/737093/ > [3] > https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life > > > > > On 2020. 07. 08. 23:14, Brian Rosmaita wrote: >> Lee Yarwood recently announced the change to 'unmaintained' status of >> nova stable/ocata [0] and stable/pike [1] branches, with the clever >> idea of back-dating the 6 month period of un-maintenance to the most >> recent commit to each branch.  I took a look at cinder stable/ocata >> and stable/pike, and the most recent commit to each is 8 months ago >> and 7 months ago, respectively. >> >> The Cinder team discussed this at today's Cinder meeting and agreed >> that this email will serve as notice to the OpenStack Community that >> the following openstack/cinder branches have been in 'unmaintained' >> status for the past 6 months: >> - stable/ocata >> - stable/pike >> >> The Cinder team hereby serves notice that it is our intent to ask the >> openstack infra team to tag each as EOL at its current HEAD and delete >> the branches two weeks from today, that is, on Wednesday, 22 July 2020. >> >> (This applies also to the other stable-branched cinder repositories, >> that is, os-brick, python-cinderclient, and >> python-cinderclient-extension.) >> >> Please see [2] for information about the maintenance phases and what >> action would need to occur before 22 July for a branch to be adopted >> back to the 'extended maintenance' phase. >> >> On behalf of the Cinder team, thank you for your attention to this >> matter. >> >> >> cheers, >> brian >> >> >> [0] >> http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html >> >> [1] >> http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html >> >> [2] https://docs.openstack.org/project-team-guide/stable-branches.html >> > > From zbitter at redhat.com Wed Jul 22 19:40:44 2020 From: zbitter at redhat.com (Zane Bitter) Date: Wed, 22 Jul 2020 15:40:44 -0400 Subject: [Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found In-Reply-To: References: Message-ID: <7fe6626a-0abb-97ca-fbfb-2066f426b9bf@redhat.com> On 21/07/20 8:03 pm, Laurent Dumont wrote: > Hi! > > We are currently troubleshooting a Heat stack issue where one of the > stack (one of 25 or so) is failing to be created properly (seemingly > randomly). > > The actual error returned by Heat is quite strange and Google has been > quite sparse in terms of references. > > The actual error looks like the following (I've sanitized some of the > names): > > Resource CREATE failed: resources.potato: Resource CREATE failed: > resources[0]: raw template with id 22273 not found When creating a nested stack, rather than just calling the RPC method to create a new stack, Heat stores the template in the database first and passes the ID in the RPC message.[1] (It turns out that by doing it this way we can save massive amounts of memory when processing a large tree of nested stacks.) My best guess is that this message indicates that the template row has been deleted by the time the other engine goes to look at it. I don't see how you could have got an ID like 22273 without the template having been successfully stored at some point. The template is only supposed to be deleted if the RPC call returns with an error.[2] The only way I can think of for that to happen before an attempt to create the child stack is if the RPC call times out, but the original message is eventually picked up by an engine. I would check your logs for RPC timeouts and consider increasing them. What does the status_reason look like at one level above in the tree? That should indicate the first error that caused the template to be deleted. > heat resource-list STACK_NAME_HERE -n 50 > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | resource_name    | physical_resource_id                 | > resource_type           | resource_status | updated_time         | > stack_name >     | > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > | potato              | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | > CREATE_FAILED   | 2020-07-18 T19:52:10Z | > nested_stack_1_STACK_NAME_HERE                  | > | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup   | > CREATE_COMPLETE | 2020-07-21T19:52:10Z | > nested_stack_1_STACK_NAME_HERE                  | > | 0                |                                      | > potato1.yaml     | CREATE_FAILED   | 2020-07-18T19:52:12Z | > nested_stack_2_STACK_NAME_HERE | > | 1                |                                      | > potato1.yaml     | INIT_COMPLETE   | 2020-07- 18 T19:52:12Z | > nested_stack_2_STACK_NAME_HERE | > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > > The template itself is pretty simple and attempts to create a > ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that > one the creation of those machines fails and Heat get's a little cooky > and returns an error that might not be the actual root cause. I would > have expected the VM to show up in the resource list but I just see the > source "yaml". It's clear from the above output that the scaled unit of the resource group is in fact a template (not an OS::Nova::Server), and the error is occurring trying to create a stack from that template (potato1.yaml) - before Heat even has a chance to start creating the server. > Has anyone seen something similar in the past? Nope. cheers, Zane. [1] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384 [2] https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342 From noonedeadpunk at ya.ru Wed Jul 22 19:52:09 2020 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Wed, 22 Jul 2020 22:52:09 +0300 Subject: [openstack-ansible] os_congress role retirement Message-ID: <7450201595447276@mail.yandex.ru> Hi everyone. Since congress service has been retired with [1], there's no reason for us to carry on and maintain os_congress role [2] for the inactive project. In case congress service will be revived, we can revert role retirement and continue its support in the future. [1] https://review.opendev.org/#/c/721733/ [2] https://opendev.org/openstack/openstack-ansible-os_congress -- Kind regards, Dmitriy Rabotyagov From noonedeadpunk at ya.ru Wed Jul 22 19:59:38 2020 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Wed, 22 Jul 2020 22:59:38 +0300 Subject: [openstack-ansible] os_congress role retirement In-Reply-To: <7450201595447276@mail.yandex.ru> References: <7450201595447276@mail.yandex.ru> Message-ID: <10221595447887@mail.yandex.ru> An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Wed Jul 22 20:42:29 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Wed, 22 Jul 2020 16:42:29 -0400 Subject: ask.openstack.org | Down? Message-ID: Hi! Trying to find some hints on why ask.openstack.org is down. Not seeing any maintenance alerts/decommissioning. Just curious :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Jul 22 21:00:38 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 22 Jul 2020 17:00:38 -0400 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: References: Message-ID: <940b980c-0d11-47d3-6312-5bf2abf0d75a@gmail.com> On 7/8/20 5:14 PM, Brian Rosmaita wrote: [snip] > This email will serve as notice to the OpenStack Community that the > following openstack/cinder branches have been in 'unmaintained' status > for the past 6 months: > - stable/ocata > - stable/pike > > The Cinder team hereby serves notice that it is our intent to ask the > openstack infra team to tag each as EOL at its current HEAD and delete > the branches two weeks from today, that is, on Wednesday, 22 July 2020. As promised, here are the EOL patches: ocata: https://review.opendev.org/742513 pike: https://review.opendev.org/742523 From sean.mcginnis at gmx.com Wed Jul 22 21:13:43 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 22 Jul 2020 16:13:43 -0500 Subject: [release] Release countdown for week R-12 July 20 - July 24 Message-ID: <20200722211343.GA2877982@sm-workstation> Better late than never? Could have sworn I sent this last week. Development Focus ----------------- The Victoria-2 milestone is next week, on July 30! Victoria-related specs should now be finalized so that teams can move to implementation ASAP. Some teams observe specific deadlines on the second milestone (mostly spec freezes): please refer to https://releases.openstack.org/victoria/schedule.html for details. General Information ------------------- Libraries need to be released at least once per milestone period. Next week, the release team will propose releases for any library that has not been otherwise released since milestone 1. PTL's and release liaisons, please watch for these and give a +1 to acknowledge them. If there is some reason to hold off on a release, let us know that as well. A +1 would be appreciated, but if we do not hear anything at all by the end of the week, we will assume things are OK to proceed. Remember that non-library deliverables that follow the cycle-with-intermediary release model should have an intermediary release before milestone-2. Those who haven't will be proposed to switch to the cycle-with-rc model, which is more suited to deliverables that are released only once per cycle. Next week is also the deadline to freeze the contents of the final release. All new 'Victoria' deliverables need to have a deliverable file in https://opendev.org/openstack/releases/src/branch/master/deliverables and need to have done a release by milestone-2. Upcoming Deadlines & Dates -------------------------- Victoria-2 milestone: July 30 Ussuri Cycle-trailing deadline: August 13 From fungi at yuggoth.org Wed Jul 22 21:26:22 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 22 Jul 2020 21:26:22 +0000 Subject: [infra] ask.openstack.org | Down? In-Reply-To: References: Message-ID: <20200722212621.5ao6y7fyt36kbg4m@yuggoth.org> On 2020-07-22 16:42:29 -0400 (-0400), Laurent Dumont wrote: > Trying to find some hints on why ask.openstack.org is down. Not > seeing any maintenance alerts/decommissioning. > > Just curious :) Thanks for the heads up, looks like that VM hung, likely some unpleasantness at the hypervisor host layer but it's public cloud so really no idea (some hung kernel tasks on the console, not responding to input). After an `openstack server reboot --hard ...` it's back up again. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ltoscano at redhat.com Wed Jul 22 22:24:41 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Wed, 22 Jul 2020 18:24:41 -0400 (EDT) Subject: [all][goals] Switch legacy Zuul jobs to native - update #1 In-Reply-To: <285113831.42823595.1595456562440.JavaMail.zimbra@redhat.com> Message-ID: <53977697.42823616.1595456681678.JavaMail.zimbra@redhat.com> Hi, Victoria is not far away and one of the goal is the removal of all legacy Zuul v3 jobs. tl;dr - the goal page is https://governance.openstack.org/tc/goals/selected/victoria/native-zuulv3-jobs.html - that page is being updated (https://review.opendev.org/#741987/), please check the migration guide on: https://docs.openstack.org/project-team-guide/zuulv3.html - I summarized the status on this etherpad, which will be used as reference: https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration - I'm around as tosky on IRC, feel free to ping me or ask here on the list, especially if I haven't contacted you already - if you haven't started yet, please prioritize this goal, which will also provide most of the work for the other community goal (migrate from bionic to focal) almost for free. --- And now, the longer version. It's time for some updates about the "Switch legacy Zuul jobs to native" community goal. I apologize for the lack of official communication so far about the goal, despite the Victoria cycle has been open for a (long) while. On the other hand, the porting of the legacy is ongoing. Thanks to everyone who helped with the rewriting even before this was a goal, and thanks to the people who started working on this (writing and reviewing patches). I've got in touch with a few teams during the PTG and during the team meetings in the past weeks, but unfortunately attended the PTG or is active on IRC, and with few exceptions, a significant amount of legacy jobs belongs to less active teams. The projects which right now use legacy job, regardless of the status (many of them have patches floating around), are listed below. Projects with (*) have been contacted by me during or after the PTG and the work is ongoing: - barbican (*) - blazar (*) - cinder (*) - designate - ec2-api - freezer - glance (*) - heat (*) - infra (*) - ironic (*) - karbor - keystone (*) - kuryr (*) - magnum - manila (*) - monasca (*) - murano - neutron (*) - nova (*) - oslo (*) - senlin - trove - tripleo (*) - vitrage - watcher - zaqar Back to the etherpad (https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration): the most important section is the first one, "Jobs to be ported", which includes all the legacy jobs which affects Victoria. While the goal focuses on Victoria, when the jobs are ported, in order to easy the maintenance, I believe it would make everyone's life easier if you just backport to the new jobs to the older branches, as far as it's possible. As a general remark, that native tempest and devstack jobs should work to all branches starting from pike, while the new grenade jobs only work starting from train right now (they may be enabled in stein in the future, but it's not sure yet). Please remember to always use the native-zuulv3-migration gerrit topic. The "Stretch goal only (they don't affect master)" section lists the jobs defined in openstack/openstack-zuul-jobs.git which are still used by stable branches. If you feel like that and if it's possible, please backport the native jobs to the not-yet covered branches, so that we can clean up openstack-zuul-jobs. Ciao -- Luigi From laurentfdumont at gmail.com Wed Jul 22 22:42:16 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Wed, 22 Jul 2020 18:42:16 -0400 Subject: [infra] ask.openstack.org | Down? In-Reply-To: <20200722212621.5ao6y7fyt36kbg4m@yuggoth.org> References: <20200722212621.5ao6y7fyt36kbg4m@yuggoth.org> Message-ID: Happens to the best of us! ;) Looks back up, thanks! On Wed, Jul 22, 2020 at 5:32 PM Jeremy Stanley wrote: > On 2020-07-22 16:42:29 -0400 (-0400), Laurent Dumont wrote: > > Trying to find some hints on why ask.openstack.org is down. Not > > seeing any maintenance alerts/decommissioning. > > > > Just curious :) > > Thanks for the heads up, looks like that VM hung, likely some > unpleasantness at the hypervisor host layer but it's public cloud so > really no idea (some hung kernel tasks on the console, not > responding to input). After an `openstack server reboot --hard ...` > it's back up again. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tessa at plum.ovh Thu Jul 23 01:51:31 2020 From: tessa at plum.ovh (Tessa Plum) Date: Thu, 23 Jul 2020 09:51:31 +0800 Subject: ask.openstack.org | Down? In-Reply-To: References: Message-ID: <74471d2f-37bf-d296-a294-647bf23783f0@plum.ovh> Laurent Dumont wrote: > Trying to find some hints on why ask.openstack.org > is down. Not seeing any maintenance > alerts/decommissioning. Here is operated fine. :) Tessa From iwienand at redhat.com Thu Jul 23 04:21:03 2020 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 23 Jul 2020 14:21:03 +1000 Subject: [infra] CentOS support for mirror role in system-config In-Reply-To: <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> References: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> Message-ID: <20200723042103.GA1740223@fedora19.localdomain> On Tue, Jul 21, 2020 at 09:30:19AM -0700, Clark Boylan wrote: > One specific concern along these lines is we've added https support > to the mirrors. Another thing I can see coming is kafs support; which requires recent kernels but is becoming available in Debian. Just another area we'll probably want to play in that is distro specific. > Would RDO expect us to coordinate upstream changes to the mirrors > with them? Perhaps we should quantify what the bits are we need? As I mentioned, I've been shy to move the openafs roles outside system-config because they rely on debs/rpms built specifically by us to work around no-packages (rpm) or out of date packages (deb). I don't want anyone to think they're generic then we break something when we update the packages for our own purposes. There isn't really any magic in the the apache setup done in the mirror role; it's more or less a straight "install packages put config in" role. That argument cuts both ways -- it's not much for system-config to maintain but it's not really much to duplicate outside. The mirror config I can see us wanting to be in sync with. I'd be happy to move that into a separate role, with a few paramaters to make it write out in different locations, etc. instead of lumping it all in with the server setup? Is that a compromise position between keeping centos servers in system-config and making things reusable? Are there other roles of concern? -i From iwienand at redhat.com Thu Jul 23 04:25:02 2020 From: iwienand at redhat.com (Ian Wienand) Date: Thu, 23 Jul 2020 14:25:02 +1000 Subject: Need help with BlockDeviceSetupException In-Reply-To: References: Message-ID: <20200723042502.GB1740223@fedora19.localdomain> On Wed, Jul 22, 2020 at 09:43:23PM +0530, PRAKASH DAVANE wrote: > I am trying Disk Image Builder and followed initial instructions given > in DIB documentation. I tried creating basic ubuntu image with > 'disk-image-create ubuntu vm" command. I am executing this on CentOS7 > server installation on virtual box machine on my laptop. I am getting > following error - > Calling [sudo kpartx -d /dev/loop0] > exec_sudo failed Let's track this in https://bugs.launchpad.net/diskimage-builder/+bug/1888557 which I see you filed, thanks. For reference, we don't log the output of the failed command at the default log level, so running this with "-x" will be the next step to figure out what's going on. Thanks, -i From radoslaw.piliszek at gmail.com Thu Jul 23 08:26:12 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 23 Jul 2020 10:26:12 +0200 Subject: [kolla] Kall Message-ID: Hello Fellow OpenStackers, today (2020-07-23) marks the day of the 2nd Kolla Kall [1]. Everyone interested in Kolla (and friends) development is invited to join. Kall starts at 15 (UTC) at https://meetpad.opendev.org/KollaKall and lasts one hour. [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jul 23 12:08:59 2020 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 23 Jul 2020 14:08:59 +0200 Subject: [Release-job-failures] Release of openstack/oslo.messaging for ref refs/tags/12.2.2 failed In-Reply-To: References: Message-ID: <76cd6447-79e2-fdcb-befa-eb5b0a5c61b3@openstack.org> zuul at openstack.org wrote: > Build failed. > > - openstack-upload-github-mirror https://zuul.opendev.org/t/openstack/build/2a9767af93c640ddb1bd1f864b2e71e3 : SUCCESS in 1m 03s > - release-openstack-python https://zuul.opendev.org/t/openstack/build/dfce7b106189491b9d9026a079c06bdd : POST_FAILURE in 4m 19s > - announce-release https://zuul.opendev.org/t/openstack/build/None : SKIPPED > - propose-update-constraints https://zuul.opendev.org/t/openstack/build/None : SKIPPED Two recent issues with AFS publication: oslo.messaging 12.2.2 - tag OK, build OK, pyPI OK, tarball not published https://zuul.opendev.org/t/openstack/build/dfce7b106189491b9d9026a079c06bdd designate 8.0.1 - tag OK, build OK, pyPI OK, tarball not published https://zuul.opendev.org/t/openstack/build/0f6c659223df46278c26460b2f3281fe Error: There was an issue creating /afs/.openstack.org as requested: [Errno 13] Permission denied: b'/afs/.openstack.org' Impact: - Tarballs are missing from tarballs.o.o - Missing release announces - Missing constraint updates We also had one AFS issue with docs publication to releases.o.o: https://zuul.opendev.org/t/openstack/build/3ca65d8665514c429629d18485ed186b But that should get synced at next refresh. -- Thierry Carrez (ttx) From ekuvaja at redhat.com Thu Jul 23 12:26:35 2020 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Thu, 23 Jul 2020 13:26:35 +0100 Subject: [all][tc][glance] Meeting about Glance and WSGI support In-Reply-To: References: Message-ID: On Tue, Jul 21, 2020 at 6:00 PM Nikolla, Kristi wrote: > We're trying to organize a meeting between the TC, Glance team and other > interested people with regard to Glance's merging (and backporting) the > removal of support from the documentation about the WSGI mode of deployment > (running Glance under HTTPD/uWSGI) > > Objective of the meeting is to find agreement about the path forward. > > See Etherpad for more context and captured notes. [0] > > The proposed time is this Thursday, July 23rd at 1500 UTC during the > regularly scheduled TC Office Hours. [1] > > If that time doesn't work for you, and you'd like to participate in the > meeting please respond to this email and we can work in finding a more > appropriate time. > > Meeting will be over conference call, unless there are objections. > Discussion will be captured in the same Etherpad. > > [0]. https://etherpad.opendev.org/p/tc-glance-wsgi > [1]. http://eavesdrop.openstack.org/#Technical_Committee_Office_hours > > I've scheduled the conf call for us on https://bluejeans.com/389964183 The web client should work fine on all platforms so no need to install anything special to join the discussion. - jokke -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Jul 23 12:57:05 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 23 Jul 2020 12:57:05 +0000 Subject: [kolla] Kall In-Reply-To: References: Message-ID: <20200723125705.yhsehx64vz4klpgv@yuggoth.org> On 2020-07-23 10:26:12 +0200 (+0200), Radosław Piliszek wrote: > Hello Fellow OpenStackers, > > today (2020-07-23) marks the day of the 2nd Kolla Kall [1]. > Everyone interested in Kolla (and friends) development is invited to join. > Kall starts at 15 (UTC) at https://meetpad.opendev.org/KollaKall > and lasts one hour. > > [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall Note that Etherpad URLs are case-sensitive and Meetpad URLs are case-insensitive, so we *strongly* recommend making your Meetpad URLs all lower-case as a result (because they'll end up mapping to the lower-case version of their name on the Etherpad side). We've discussed how we might go about faking case-insensitivity in our Etherpad through some creative Apache rewrites, but before we can do that we have to analyze and clean up thousands of case-insensitivity collisions between Etherpad names, which nobody's had time to tackle yet. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From elod.illes at est.tech Thu Jul 23 14:56:53 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Thu, 23 Jul 2020 16:56:53 +0200 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: <63f0f9f5-0710-bcbd-f8c9-eeea5e5366cb@gmail.com> References: <8225c61e-687c-0116-da07-52443f315e43@est.tech> <63f0f9f5-0710-bcbd-f8c9-eeea5e5366cb@gmail.com> Message-ID: Hi Brian, Sorry to hear that :( As a side note, maybe it's good to mention, that bugfixes are getting merged in Pike now as gate is fixed (e.g. in Nova [1], where I am stable core, and also in Neutron[2]). Maybe before you EOL Cinder's Pike, it would be nice to review & merge at least the open patches [3], I can help with the review as soon as the gate fixing patch [4] has merged (which I have already reviewed :)). To be honest I haven't reviewed yet the other patches because I reviewed first the gate fixing ones and waited them to get merged. Anyway, I'm always happy to help with stable reviews, at least from stable core point of view (but I can only give +1 for patches in Cinder). About the Cinder zuul jobs in EOL candidate branches: I'll go through the zuul jobs in Pike and Ocata in Cinder to look for unused job definitions and propose deletion patch if there are such. Thanks, Előd [1] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike+status:merged [2] https://review.opendev.org/#/q/project:openstack/neutron+branch:stable/pike+status:merged [3] https://review.opendev.org/#/q/project:openstack/cinder+branch:stable/pike+status:open [4] https://review.opendev.org/#/c/737094/ On 2020. 07. 22. 21:28, Brian Rosmaita wrote: > On 7/9/20 12:27 PM, Előd Illés wrote: >> Hi, >> >> Sorry for sticking my nose into this thread (again o:)), just a >> couple of thoughts: > > Always happy to see your nose :-) > >> - we had a rough month with failing Devstack and Tempest (and other) >> jobs, but thanks to Gmann and others we could fix most of the issues >> (except Tempest in Ocata, that's why it is announced generally as >> Unmaintained [0]) >> - this added some extra time to show a branch as unmaintained >> - branches in extended maintenance are not that busy branches, but >> still, I see some bugfix backports coming in even in Pike (in spite >> of failing gate in the last month) >> - Lee announced nova's Unmaintained state in the same circumstances, >> as we just fixed Pike's devstack - and I also sent a reply that I >> will continue to maintain nova's stable/pike as it is getting in a >> better shape now >> >> Last but not least: in cinder, there are "Zuul +1"d gate fixes both >> for Pike [1] (and Queens [2]), so it's not that hopeless. >> >> I don't want to keep a broken branch open in any cost, but does it >> cost that much? I mean, if there is the possibility to push a fix, >> why don't we let it happen? Right now Cinder Pike's gate seems >> working (with the fix, which needs an approve [1]). > > We discussed this at the past two Cinder project team meetings, once > to think about the idea and again today to make sure there were no > second thoughts.  I proposed that we would keep Pike open if someone > on the cinder stable maintenance team were willing to "adopt" the > branch.  The silence was deafening.  In short, no one on the core team > is interested in approving patches for stable/pike, and no one in the > wider Cinder project team of active contributors has any objections. >> My suggestion is that let Pike still be in Extended Maintenance as it >> is still have a working gate ([1]) and EOL Ocata as it was already >> about to happen according to the mail thread [0], if necessary. > > We appreciate your suggestion, but the feeling of the Cinder project > team is that we should EOL both Pike and Ocata. > >> >> Also, please check the steps in 'End of Life' chapter of the stable >> guideline [3] and let me offer my help if you need it for the >> transition. > > I appreciate your offer.  I'll have the EOL patches posted shortly.  > The only thing I'm not sure about is whether there are zuul jobs in > other repositories that are not needed any more.  I don't think there > are, but I may be having a failure of imagination in deciding where to > look. > >> >> Cheers, >> >> Előd >> >> [0] >> http://lists.openstack.org/pipermail/openstack-discuss/2020-May/thread.html#15112 >> >> [1] https://review.opendev.org/#/c/737094/ >> [2] https://review.opendev.org/#/c/737093/ >> [3] >> https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life >> >> >> >> >> On 2020. 07. 08. 23:14, Brian Rosmaita wrote: >>> Lee Yarwood recently announced the change to 'unmaintained' status >>> of nova stable/ocata [0] and stable/pike [1] branches, with the >>> clever idea of back-dating the 6 month period of un-maintenance to >>> the most recent commit to each branch.  I took a look at cinder >>> stable/ocata and stable/pike, and the most recent commit to each is >>> 8 months ago and 7 months ago, respectively. >>> >>> The Cinder team discussed this at today's Cinder meeting and agreed >>> that this email will serve as notice to the OpenStack Community that >>> the following openstack/cinder branches have been in 'unmaintained' >>> status for the past 6 months: >>> - stable/ocata >>> - stable/pike >>> >>> The Cinder team hereby serves notice that it is our intent to ask >>> the openstack infra team to tag each as EOL at its current HEAD and >>> delete the branches two weeks from today, that is, on Wednesday, 22 >>> July 2020. >>> >>> (This applies also to the other stable-branched cinder repositories, >>> that is, os-brick, python-cinderclient, and >>> python-cinderclient-extension.) >>> >>> Please see [2] for information about the maintenance phases and what >>> action would need to occur before 22 July for a branch to be adopted >>> back to the 'extended maintenance' phase. >>> >>> On behalf of the Cinder team, thank you for your attention to this >>> matter. >>> >>> >>> cheers, >>> brian >>> >>> >>> [0] >>> http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html >>> >>> [1] >>> http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html >>> >>> [2] https://docs.openstack.org/project-team-guide/stable-branches.html >>> >> >> > > From erin at openstack.org Thu Jul 23 15:08:09 2020 From: erin at openstack.org (Erin Disney) Date: Thu, 23 Jul 2020 10:08:09 -0500 Subject: The Open Infrastructure Summit is Going Virtual! Message-ID: <804945D7-6220-4B22-BEA0-306E9300A796@openstack.org> Hi everyone, The Open Infrastructure Summit will officially be held virtually, the week of October 19. Emails about project onboarding / updates, PTG, and the Forum will be coming later, but below are ways you can currently get prepared: Register for free and make sure to mark your calendars for October 19 - 23. Get your Summit submission together now! The CFP deadline is August 4 at 11:59pm PT, so start drafting your presentations and panels around Open Infrastructure use cases like AI/Machine Learning, CI/CD, Container Infrastructure, Edge Computing and of course, Public, Private and Hybrid Clouds. Get your organization some visibility - check out the prospectus and sign up to sponsor the virtual Summit! Executable Sponsorship Contract will be available starting at 10:30am CT on Tuesday, July 28. If you have any questions, please reach out to summit at openstack.org . “See you” in October! Cheers, Erin Erin Disney OpenStack Marketing erin at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 23 16:05:43 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 23 Jul 2020 18:05:43 +0200 Subject: [kolla] Kall In-Reply-To: <20200723125705.yhsehx64vz4klpgv@yuggoth.org> References: <20200723125705.yhsehx64vz4klpgv@yuggoth.org> Message-ID: On Thu, Jul 23, 2020 at 2:59 PM Jeremy Stanley wrote: > On 2020-07-23 10:26:12 +0200 (+0200), Radosław Piliszek wrote: > > Hello Fellow OpenStackers, > > > > today (2020-07-23) marks the day of the 2nd Kolla Kall [1]. > > Everyone interested in Kolla (and friends) development is invited to > join. > > Kall starts at 15 (UTC) at https://meetpad.opendev.org/KollaKall > > and lasts one hour. > > > > [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall > > Note that Etherpad URLs are case-sensitive and Meetpad URLs are > case-insensitive, so we *strongly* recommend making your Meetpad > URLs all lower-case as a result (because they'll end up mapping to > the lower-case version of their name on the Etherpad side). We've > discussed how we might go about faking case-insensitivity in our > Etherpad through some creative Apache rewrites, but before we can do > that we have to analyze and clean up thousands of case-insensitivity > collisions between Etherpad names, which nobody's had time to tackle > yet. > -- > Jeremy Stanley > Thanks, Jeremy, that makes sense. I wouldn't stress the redirects too much - maybe just redirect on the Meetpad side (always normalize to lowercase there)? That said, we had some technical problems during our Kall today; there were only five of us but there were sound (stuttering) and image (very abnormal latency) issues unfortunately. We reused the Google Meet channel made for Kolla Klub to avoid the issues (and it was smooth). It worked better the last time, just not today. :-( -yoctozepto -------------- next part -------------- An HTML attachment was scrubbed... URL: From summit at openstack.org Thu Jul 23 16:39:34 2020 From: summit at openstack.org (OpenStack Foundation) Date: Thu, 23 Jul 2020 16:39:34 -0000 Subject: The Open Infrastructure Summit is Going Virtual! Message-ID: Call For Presentations is open! To view this email online, paste this link into your browser: https://t.e2ma.net/message/2ndfhc/ysadgbb GET EXCITED The Open Infrastructure Summit is coming directly to your browser!  (https://t.e2ma.net/click/2ndfhc/ysadgbb/ulmein) GET READY Register for the Virtual Summit YOU READ IT RIGHT The annual Open Infrastructure Summit is going to be virtual (and free!), increasing accessibility for community members around the world! Now what? STEP ONE Register for free (https://t.e2ma.net/click/2ndfhc/ysadgbb/aenein)! Register for the virtual Summit and make sure to mark your calendars - the Open Infrastructure Summit will be held October 19-23. STEP TWO Submit your talks (https://t.e2ma.net/click/2ndfhc/ysadgbb/q6nein)!  The Call For (https://t.e2ma.net/click/2ndfhc/ysadgbb/6yoein) Presentations (https://t.e2ma.net/click/2ndfhc/ysadgbb/mrpein) (CFP) deadline is August 4 at 11:59pm PT, so start drafting your presentations and panels around Open Infrastructure use cases like AI/Machine Learning, CI/CD, Container Infrastructure, Edge Computing and of course, Public, Private and Hybrid Clouds. STEP THREE Join the conversation (https://t.e2ma.net/click/2ndfhc/ysadgbb/2jqein)! Invite your friends to login on October 19 for a packed agenda from the global community and join the conversation using #OpenInfraSummit (https://t.e2ma.net/click/2ndfhc/ysadgbb/icrein)! ONE EXTRA STEP (IT'S WORTH IT) Sponsor the Summit! (https://t.e2ma.net/click/2ndfhc/ysadgbb/y4rein) Getting your organization some visibility is only one extra step - check out the prospectus and sign up to sponsor the virtual Summit! Learn more about the Summit (https://t.e2ma.net/click/2ndfhc/ysadgbb/exsein) PO Box 1903 | Austin , TX 78767 United States This email was sent to openstack-infra at lists.openstack.org. To ensure that you continue receiving our emails, please add us to your address book or safe list. manage your preferences (https://app.e2ma.net/app2/audience/signup/1812998/1771360/132927328/2247466848/?s=IMw3KHUO9QONB72esCY3HN8mZTfF-W3LMt4-vepPLUQ) opt out (https://t.e2ma.net/optout/2ndfhc/ysadgbb?s=dyit5ndwSJyARvqMDTMrpiGM27LZJv3wvdgK834Zxh8&r=aHR0cHM6Ly9hcHAuZTJtYS5uZXQvYXBwMi9hdWRpZW5jZS9vcHRfb3V0LzE4MTI5OTgvMTc3MTM2MC8yMjQ3NDY2ODQ4Lz9zPWVOOHpXMDRzSkxhSlZaT1VPRTlRVy1vMW5KZWdGUXNwM0ZzbzI5YzZiUUk%3D) using TrueRemove(r). Got this as a forward? Sign up (https://app.e2ma.net/app2/audience/signup/1812998/1771360.132927328/) to receive our future emails. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Jul 23 17:32:58 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 23 Jul 2020 12:32:58 -0500 Subject: [PTL][Stable] Releases proposed for stable/stein In-Reply-To: <821d2cfa-1aeb-d532-0b56-db3918ab0215@gmx.com> References: <821d2cfa-1aeb-d532-0b56-db3918ab0215@gmx.com> Message-ID: Bump on this. Patches with no responses will be abandoned tomorrow. https://review.opendev.org/#/q/status:open+project:openstack/releases+branch:master+topic:stein-stable On 7/17/20 5:43 PM, Sean McGinnis wrote: > /me takes off release team hat and puts on stable team hat > > Hey everyone, > > To help out with stable releases, I've run a script to propose releases > for any deliverables in stable/stein that had commits merged but not > released yet. This is just to try to help make sure those fixes get out > downstream, and to help ease the crunch that we inevitably have near the > time that stable/stein goes into Extended Maintenance mode (this coming > November). > > These are not driven by the release team, and they are not required. > They are merely a convenience to help out the teams. If there is a patch > for any deliverables owned by your team and you are good with the > release, please leave a +1 and we will process it. Any patches with a > -1, or anything not acknowledged by the end of next week, will just be > abandoned. Of course, stable releases can be proposed by the team > whenever they are ready. > > Again, this is not a release team activity. This may or may not be done > regularly. I just had some time and an itch to do it. > > Patches can be found here: > > https://review.opendev.org/#/q/topic:stein-stable+(status:open+OR+status:merged) > > > Thanks! > > Sean > > From akekane at redhat.com Thu Jul 23 17:50:45 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 23 Jul 2020 23:20:45 +0530 Subject: [glance] Meeting about copy-image race condition Message-ID: Hi All, We (glance team) are organizing one meeting on Monday 27 July at 1400 UTC [1] . During ussuri glance has added the new import method 'copy-image' [2] to copy existing images in multiple stores. Recently while adding a CI job at nova side we have found out that there will be race conditions [3] if two or more simultaneous copy operations were initiated. There iS also WIP fix [4] submitted for the same but there have been lots of objections about the same. Objective of the meeting is to decide the approach to fix this issue. Interested people can join us to share their inputs for the same. [1] https://bluejeans.com/905878787 [2] https://review.opendev.org/696457 [3] https://bugs.launchpad.net/glance/+bug/1884596 [4] https://review.opendev.org/#/c/737596/ Thank you Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: From peljasz at yahoo.co.uk Thu Jul 23 18:51:07 2020 From: peljasz at yahoo.co.uk (lejeczek) Date: Thu, 23 Jul 2020 19:51:07 +0100 Subject: floating IP - HA, kind of - how ? References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk> Message-ID: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> hi guys, A novice here so go easy on me please. I wonder - is there a mechanism in openstack, a built-in feature where a floating IP could be juggled between guests/instances dynamically - would you know? What comes to mind is something like HA/pacemaker, something where business logic operates around condition and actions. I cannot make is simpler than such an example - instance_A has floating_IPa but if something "bad" happens to it then floating_IPa moves to instance_B - can something like that be handled by openstack's tooling or goes outside its realm and can only be worked out however anybody would do it individually? many thanks, L. -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1757 bytes Desc: not available URL: From fungi at yuggoth.org Thu Jul 23 18:57:17 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 23 Jul 2020 18:57:17 +0000 Subject: [Release-job-failures][infra] Release of openstack/oslo.messaging for ref refs/tags/12.2.2 failed In-Reply-To: <76cd6447-79e2-fdcb-befa-eb5b0a5c61b3@openstack.org> References: <76cd6447-79e2-fdcb-befa-eb5b0a5c61b3@openstack.org> Message-ID: <20200723185717.zck3uwsh5i4jku4l@yuggoth.org> On 2020-07-23 14:08:59 +0200 (+0200), Thierry Carrez wrote: [...] > oslo.messaging 12.2.2 - tag OK, build OK, pyPI OK, tarball not published > https://zuul.opendev.org/t/openstack/build/dfce7b106189491b9d9026a079c06bdd > > designate 8.0.1 - tag OK, build OK, pyPI OK, tarball not published > https://zuul.opendev.org/t/openstack/build/0f6c659223df46278c26460b2f3281fe > > Error: > There was an issue creating /afs/.openstack.org as requested: [Errno 13] > Permission denied: b'/afs/.openstack.org' > > Impact: > - Tarballs are missing from tarballs.o.o > - Missing release announces > - Missing constraint updates [...] I've retrieved the copies of the artifacts for these failed writes from PyPI, verified their integrity using the release key signatures included in the build logs, and uploaded the artifacts and signatures to the tarballs site. Both of these builds ran less than 30 minutes apart and, coincidentally, from the same executor (we presently have 12 executors). I tested writing to that same tree from ze11, where the original failures occurred, and encountered no trouble, but that was many hours later. System logs weren't particularly helpful at narrowing down various theories to any one obvious cause (the executor had spontaneously rebooted less than a day earlier, and saw a fairly large time skip at boot due to a >10-minute discrepancy between the system clock and NTP, but I have no evidence to suggest that would have caused this). As a number of other release builds ran successfully in the same timeframe, the most I can surmise is that one of our executors was temporarily unable to write to that AFS volume over the course of half an hour. I'll keep an eye out for any similar issues. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From romain.chanu at univ-lyon1.fr Thu Jul 23 19:01:43 2020 From: romain.chanu at univ-lyon1.fr (CHANU ROMAIN) Date: Thu, 23 Jul 2020 19:01:43 +0000 Subject: floating IP - HA, kind of - how ? In-Reply-To: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk>, <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> Message-ID: <1595530904689.76061@univ-lyon1.fr> Hello, You should look into Octavia project: LoadBalancer as a Service. Best Regards, Romain ________________________________________ From: lejeczek Sent: Thursday, July 23, 2020 8:51 PM To: OpenStack Discuss Subject: floating IP - HA, kind of - how ? hi guys, A novice here so go easy on me please. I wonder - is there a mechanism in openstack, a built-in feature where a floating IP could be juggled between guests/instances dynamically - would you know? What comes to mind is something like HA/pacemaker, something where business logic operates around condition and actions. I cannot make is simpler than such an example - instance_A has floating_IPa but if something "bad" happens to it then floating_IPa moves to instance_B - can something like that be handled by openstack's tooling or goes outside its realm and can only be worked out however anybody would do it individually? many thanks, L. From rosmaita.fossdev at gmail.com Thu Jul 23 19:43:57 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 23 Jul 2020 15:43:57 -0400 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: References: <8225c61e-687c-0116-da07-52443f315e43@est.tech> <63f0f9f5-0710-bcbd-f8c9-eeea5e5366cb@gmail.com> Message-ID: <9d415a4e-11a7-87f2-337f-07840d1751c8@gmail.com> On 7/23/20 10:56 AM, Előd Illés wrote: [snip] > Maybe before you EOL Cinder's Pike, it would be nice to review & merge > at least the open patches [3], I can help with the review as soon as the > gate fixing patch [4] has merged (which I have already reviewed :)). To > be honest I haven't reviewed yet the other patches because I reviewed > first the gate fixing ones and waited them to get merged. Anyway, I'm > always happy to help with stable reviews, at least from stable core > point of view (but I can only give +1 for patches in Cinder). We've held off on merging anything because if we're going to EOL it anyway, what's the point? -- and we didn't want to reset the 6-month 'unmaintenance' clock. But if: (1) the gates are really working, and (2) the community agrees that we can make a set of final commits to stable/pike and then immediately EOL it -- I think that would be reasonable, especially since it would allow us to merge the fixes for OSSN-0086 into stable/pike, which would be nice (though the patches have been available in Gerrit for anyone who wants them). There are no open reviews for python-cinderclient or python-brick-cinderclient-ext, so we don't have to worry about those repos. With respect to (1), I've got two test patches to make sure the stable/pike cinder and os-brick gates are functional today: - https://review.opendev.org/730959 - https://review.opendev.org/731196 I don't mean to be unreasonable, but if I have to do more than 2 rechecks on either of those to get them to pass, I have no interest in proceeding to step 2. (They both must pass because the ossn-0086 fix must be applied to both cinder and os-brick or it doesn't fix anything.) With respect to (2), the policy reads: "After a project/branch exceeds the time allocation as Unmaintained, or a team decides to explicitly end support for a branch, it will become End of Life." [0] My reading of that "or" is that we would *not* have to wait another 6 months to declare Pike EOL given that the Cinder team has explicitly decided to end support for that branch. If anyone interested in this matter reads the document differently, now would be a good time to speak up. [0] https://opendev.org/openstack/project-team-guide/src/commit/5a8b34fbba7c0744456f5d32167e0295f8578387/doc/source/stable-branches.rst And, just to be clear about what patches are eligible: cinder: - https://review.opendev.org/737094 - https://review.opendev.org/733662 - https://review.opendev.org/734725 - https://review.opendev.org/734723 - https://review.opendev.org/729604 os-brick: - https://review.opendev.org/733615 - https://review.opendev.org/740318 No other reviews will be considered for inclusion. I put a -W on the "Cinder: EOL Pike" patch while we think this over. But one way or another, the cinder project stable/pike branches will be EOL by this time next week. > > About the Cinder zuul jobs in EOL candidate branches: I'll go through > the zuul jobs in Pike and Ocata in Cinder to look for unused job > definitions and propose deletion patch if there are such. Thanks, I appreciate it. [snip] From alexander.dibbo at stfc.ac.uk Fri Jul 24 06:34:49 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 24 Jul 2020 06:34:49 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> Message-ID: <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> Hi All, Any other ideas of what to try? Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Alexander Dibbo - UKRI STFC Sent: 22 July 2020 15:28 To: Bharat Kunwar Cc: openstack-discuss at lists.openstack.org Subject: RE: Magnum: invalid format of client version Hi Bharat, Creating a VM from the CLI works fine, as does the heat stack. In both cases using the same image, flavour and keypair/ Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Bharat Kunwar > Sent: 22 July 2020 15:04 To: Dibbo, Alexander (STFC,RAL,SC) > Cc: openstack-discuss at lists.openstack.org Subject: Re: Magnum: invalid format of client version Hi Alex Looks like the following errors are being emitted from Nova via Heat. Would you mind ensuring that you are able to spin up servers using nova CLI and then creating a simple heat stack via heat CLI. Cheers Bharat On 21 Jul 2020, at 13:53, Alexander Dibbo - UKRI STFC > wrote: Jul 17 14:08:24 host-172-16-103-43 magnum-conductor: 2020-07-17 14:08:24.942 6251 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: UnsupportedVersion: : resources.worker_nodes_server_group: : Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of version. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Fri Jul 24 06:55:37 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Fri, 24 Jul 2020 08:55:37 +0200 Subject: [tripleo][centos8][ussuri][horizon] horizon container fails to start In-Reply-To: References: Message-ID: thank you, it is restored! :) On Wed, 22 Jul 2020 at 16:04, Ruslanas Gžibovskis wrote: > Hi all, > > Is there any way, how I could modify [0] so tripleO would not download > again my modified image? Or could it be fixed? Should I raise bug report? > is it in the launchpad again? Should it be for Horizon or for TripleO or > Kolla? > And yes, i understand that it tries to use python exec instead of python3 > exec ;) but I am completely new in containerized setup, how to modify it > "by hand" so TripleO deployment would not fix it back, and either way, if > might be not working for more people, or it works for others?! And only me > who is facing this issue? > > [0] > https://github.com/openstack/kolla/blob/12905b5fc18c93fdece91df9a2446771d10dfbad/docker/horizon/extend_start.sh#L18 > ? > > On Thu, 16 Jul 2020 at 10:38, Ruslanas Gžibovskis > wrote: > >> Hi all, >> >> I have noticed, that horizon container fails to start and some >> interestin zen_wozniak has apeared [0]. >> Healthcheck log is empty, but horizon log [1] sais "/usr/bin/python: No >> such file or directory" and there is no such file or directory :) >> >> after sume update it failed. I believe you guys will push update fast >> enough, as I am still bad at this git and container part.... >> HOW to fix it now :) on my side? As tripleo will redeploy horizon from >> images... and will update image. could you please give me a hint where to >> duck tape it whille it will be pushed to prod? >> >> [0] http://paste.openstack.org/show/3jjnsgXfWRxs3o0G6aKH/ >> [1] http://paste.openstack.org/show/1S66A55cz0UaFUWGxID8/ >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> > > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.dibbo at stfc.ac.uk Fri Jul 24 08:58:20 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 24 Jul 2020 08:58:20 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> Message-ID: >From doing some more digging I see this in the debug logs on heat 2020-07-24 08:33:10.618 2167043 INFO heat.engine.stack [req-249dfd80-48f4-44f5-9f0e-7c9ac743f047 - admin - default default] Exception in stack validation 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack Traceback (most recent call last): 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 909, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack result = res.validate() 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server_group.py", line 69, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack MICROVERSION_SOFT_POLICIES) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 107, in is_version_supported 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack self.get_max_microversion()) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 236, in get_api_version 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack api_version = APIVersion(version_string) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 77, in __init__ 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack raise exceptions.UnsupportedVersion(msg) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack UnsupportedVersion: Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of ve rsion. So it looks like heat is trying to not specify a microversion which is causing an issue with creating server groups. I have tried and I am able to create a server group and assign members with the cli. Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Jul 24 10:21:11 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Fri, 24 Jul 2020 12:21:11 +0200 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: <9d415a4e-11a7-87f2-337f-07840d1751c8@gmail.com> References: <8225c61e-687c-0116-da07-52443f315e43@est.tech> <63f0f9f5-0710-bcbd-f8c9-eeea5e5366cb@gmail.com> <9d415a4e-11a7-87f2-337f-07840d1751c8@gmail.com> Message-ID: <55937652-ac1d-6c19-13a1-854ec64e63e8@est.tech> On 2020. 07. 23. 21:43, Brian Rosmaita wrote: > On 7/23/20 10:56 AM, Előd Illés wrote: > [snip] >> Maybe before you EOL Cinder's Pike, it would be nice to review & >> merge at least the open patches [3], I can help with the review as >> soon as the gate fixing patch [4] has merged (which I have already >> reviewed :)). To be honest I haven't reviewed yet the other patches >> because I reviewed first the gate fixing ones and waited them to get >> merged. Anyway, I'm always happy to help with stable reviews, at >> least from stable core point of view (but I can only give +1 for >> patches in Cinder). > > We've held off on merging anything because if we're going to EOL it > anyway, what's the point? -- and we didn't want to reset the 6-month > 'unmaintenance' clock.  But if: > > (1) the gates are really working, and The gate jobs are really working, as I said before :) > (2) the community agrees that we can make a set of final commits to > stable/pike and then immediately EOL it -- > > I think that would be reasonable, especially since it would allow us > to merge the fixes for OSSN-0086 into stable/pike, which would be nice > (though the patches have been available in Gerrit for anyone who wants > them). > > There are no open reviews for python-cinderclient or > python-brick-cinderclient-ext, so we don't have to worry about those > repos. > > With respect to (1), I've got two test patches to make sure the > stable/pike cinder and os-brick gates are functional today: > - https://review.opendev.org/730959 > - https://review.opendev.org/731196 > I don't mean to be unreasonable, but if I have to do more than 2 > rechecks on either of those to get them to pass, I have no interest in > proceeding to step 2.  (They both must pass because the ossn-0086 fix > must be applied to both cinder and os-brick or it doesn't fix anything.) > > With respect to (2), the policy reads: "After a project/branch exceeds > the time allocation as Unmaintained, or a team decides to explicitly > end support for a branch, it will become End of Life." [0]  My reading > of that "or" is that we would *not* have to wait another 6 months to > declare Pike EOL given that the Cinder team has explicitly decided to > end support for that branch.  If anyone interested in this matter > reads the document differently, now would be a good time to speak up. Yes, you read and understand the documentation correctly, we don't have to wait another 6 months (see the patch that introduced this wording: https://review.opendev.org/#/q/I92542012108f0aa07e28968479cdaddf7e06301d ). > > [0] > https://opendev.org/openstack/project-team-guide/src/commit/5a8b34fbba7c0744456f5d32167e0295f8578387/doc/source/stable-branches.rst > > And, just to be clear about what patches are eligible: > cinder: > - https://review.opendev.org/737094 > - https://review.opendev.org/733662 > - https://review.opendev.org/734725 > - https://review.opendev.org/734723 > - https://review.opendev.org/729604 > > os-brick: > - https://review.opendev.org/733615 > - https://review.opendev.org/740318 > > No other reviews will be considered for inclusion. > > I put a -W on the "Cinder: EOL Pike" patch while we think this over. > But one way or another, the cinder project stable/pike branches will > be EOL by this time next week. Thanks for thinking this over. OK, Cinder will go EOL next week, I understand. > >> >> About the Cinder zuul jobs in EOL candidate branches: I'll go through >> the zuul jobs in Pike and Ocata in Cinder to look for unused job >> definitions and propose deletion patch if there are such. > > Thanks, I appreciate it. > > [snip] > > Thanks, Előd From alexander.dibbo at stfc.ac.uk Fri Jul 24 11:52:09 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 24 Jul 2020 11:52:09 +0000 Subject: Magnum: invalid format of client version In-Reply-To: References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> Message-ID: Ok, This definitely seems to be a heat issue. A heat stack created with a server group gives the same error Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Alexander Dibbo - UKRI STFC Sent: 24 July 2020 09:58 To: Bharat Kunwar Cc: openstack-discuss at lists.openstack.org Subject: RE: Magnum: invalid format of client version >From doing some more digging I see this in the debug logs on heat 2020-07-24 08:33:10.618 2167043 INFO heat.engine.stack [req-249dfd80-48f4-44f5-9f0e-7c9ac743f047 - admin - default default] Exception in stack validation 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack Traceback (most recent call last): 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 909, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack result = res.validate() 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server_group.py", line 69, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack MICROVERSION_SOFT_POLICIES) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 107, in is_version_supported 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack self.get_max_microversion()) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 236, in get_api_version 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack api_version = APIVersion(version_string) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 77, in __init__ 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack raise exceptions.UnsupportedVersion(msg) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack UnsupportedVersion: Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of ve rsion. So it looks like heat is trying to not specify a microversion which is causing an issue with creating server groups. I have tried and I am able to create a server group and assign members with the cli. Regards Alexander Dibbo - Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Fri Jul 24 13:10:40 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Fri, 24 Jul 2020 14:10:40 +0100 Subject: Magnum: invalid format of client version In-Reply-To: References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> Message-ID: <9ADF0874-46EE-4EAE-BDC9-8C2BAE738290@stackhpc.com> Interesting, what version of Heat and Nova are you using? That looks like a nova client error which I guess Heat is using to make calls to Nova. > On 24 Jul 2020, at 12:52, Alexander Dibbo - UKRI STFC wrote: > > Ok, This definitely seems to be a heat issue. A heat stack created with a server group gives the same error > > Regards > > Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader > For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io > To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk > To receive notifications about the service please subscribe to our mailing list at:https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD > To receive fast notifications or to discuss usage of the cloud please join our Slack:https://stfc-cloud.slack.com/ > > From: Alexander Dibbo - UKRI STFC > Sent: 24 July 2020 09:58 > To: Bharat Kunwar > Cc: openstack-discuss at lists.openstack.org > Subject: RE: Magnum: invalid format of client version > > From doing some more digging I see this in the debug logs on heat > 2020-07-24 08:33:10.618 2167043 INFO heat.engine.stack [req-249dfd80-48f4-44f5-9f0e-7c9ac743f047 - admin - default default] Exception in stack validation > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack Traceback (most recent call last): > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 909, in validate > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack result = res.validate() > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server_group.py", line 69, in validate > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack MICROVERSION_SOFT_POLICIES) > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 107, in is_version_supported > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack self.get_max_microversion()) > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 236, in get_api_version > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack api_version = APIVersion(version_string) > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 77, in __init__ > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack raise exceptions.UnsupportedVersion(msg) > 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack UnsupportedVersion: Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of ve > rsion. > > So it looks like heat is trying to not specify a microversion which is causing an issue with creating server groups. I have tried and I am able to create a server group and assign members with the cli. > > Regards > > Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader > For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io > To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk > To receive notifications about the service please subscribe to our mailing list at:https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD > To receive fast notifications or to discuss usage of the cloud please join our Slack:https://stfc-cloud.slack.com/ > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. > > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.dibbo at stfc.ac.uk Fri Jul 24 13:33:46 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 24 Jul 2020 13:33:46 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <9ADF0874-46EE-4EAE-BDC9-8C2BAE738290@stackhpc.com> References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> <9ADF0874-46EE-4EAE-BDC9-8C2BAE738290@stackhpc.com> Message-ID: <4b73bb98cbb84fe388b6c46b3b30ad30@stfc.ac.uk> Hi Bharat, I actually just resolved this issue by adding [clients] Endpoint_type = internalURL To /etc/heat/heat.conf I now have a different issue: 2020-07-24 12:53:06.206 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key api_address 2020-07-24 12:53:06.207 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:06.695 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key api_address 2020-07-24 12:53:06.696 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.271 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.272 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:26.321 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.322 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:26.348 24280 ERROR magnum.drivers.heat.driver [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] Nodegroup error, stack status: CREATE_FAILED, stack_id: 77bed453-17f4-4 779-9883-b36a82d26578, reason: Resource CREATE failed: AuthorizationFailure: resources.kube_masters.resources[0].resources.kube-master: Authorization failed. 2020-07-24 12:53:26.670 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.671 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.727 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.728 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.755 24280 ERROR magnum.drivers.heat.driver [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] Nodegroup error, stack status: CREATE_FAILED, stack_id: 77bed453-17f4-4 779-9883-b36a82d26578, reason: Resource CREATE failed: AuthorizationFailure: resources.kube_masters.resources[0].resources.kube-master: Authorization failed. Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Bharat Kunwar Sent: 24 July 2020 14:11 To: Dibbo, Alexander (STFC,RAL,SC) Cc: openstack-discuss at lists.openstack.org Subject: Re: Magnum: invalid format of client version Interesting, what version of Heat and Nova are you using? That looks like a nova client error which I guess Heat is using to make calls to Nova. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.dibbo at stfc.ac.uk Fri Jul 24 13:18:47 2020 From: alexander.dibbo at stfc.ac.uk (Alexander Dibbo - UKRI STFC) Date: Fri, 24 Jul 2020 13:18:47 +0000 Subject: Magnum: invalid format of client version In-Reply-To: <9ADF0874-46EE-4EAE-BDC9-8C2BAE738290@stackhpc.com> References: <6EE3E86A-26BD-4B17-A490-2E895E13AA72@stackhpc.com> <371296b8504f4119a11aeecedf1b02e1@stfc.ac.uk> <7e0dbb4339904df3b8c14c6ec938d528@stfc.ac.uk> <9ADF0874-46EE-4EAE-BDC9-8C2BAE738290@stackhpc.com> Message-ID: <3450216d13f14f5da7c50b8e862344a2@stfc.ac.uk> Hi Bharat, I actually just resolved this issue by adding [clients] Endpoint_type = internalURL To /etc/heat/heat.conf I now have a different issue: 2020-07-24 12:53:06.206 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key api_address 2020-07-24 12:53:06.207 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:06.695 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key api_address 2020-07-24 12:53:06.696 24280 WARNING magnum.drivers.heat.template_def [req-9c29405a-d31d-4bd8-8f56-34dc910ff5ce - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.271 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.272 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:26.321 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.322 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_masters 2020-07-24 12:53:26.348 24280 ERROR magnum.drivers.heat.driver [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] Nodegroup error, stack status: CREATE_FAILED, stack_id: 77bed453-17f4-4 779-9883-b36a82d26578, reason: Resource CREATE failed: AuthorizationFailure: resources.kube_masters.resources[0].resources.kube-master: Authorization failed. 2020-07-24 12:53:26.670 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.671 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.727 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key api_address 2020-07-24 12:53:26.728 24280 WARNING magnum.drivers.heat.template_def [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] stack does not have output_key kube_minions 2020-07-24 12:53:26.755 24280 ERROR magnum.drivers.heat.driver [req-5a19296c-754f-4443-99ce-a8377a927b44 - - - - -] Nodegroup error, stack status: CREATE_FAILED, stack_id: 77bed453-17f4-4 779-9883-b36a82d26578, reason: Resource CREATE failed: AuthorizationFailure: resources.kube_masters.resources[0].resources.kube-master: Authorization failed. Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack: https://stfc-cloud.slack.com/ From: Bharat Kunwar Sent: 24 July 2020 14:11 To: Dibbo, Alexander (STFC,RAL,SC) Cc: openstack-discuss at lists.openstack.org Subject: Re: Magnum: invalid format of client version Interesting, what version of Heat and Nova are you using? That looks like a nova client error which I guess Heat is using to make calls to Nova. On 24 Jul 2020, at 12:52, Alexander Dibbo - UKRI STFC > wrote: Ok, This definitely seems to be a heat issue. A heat stack created with a server group gives the same error Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at:https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack:https://stfc-cloud.slack.com/ From: Alexander Dibbo - UKRI STFC > Sent: 24 July 2020 09:58 To: Bharat Kunwar > Cc: openstack-discuss at lists.openstack.org Subject: RE: Magnum: invalid format of client version From doing some more digging I see this in the debug logs on heat 2020-07-24 08:33:10.618 2167043 INFO heat.engine.stack [req-249dfd80-48f4-44f5-9f0e-7c9ac743f047 - admin - default default] Exception in stack validation 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack Traceback (most recent call last): 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 909, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack result = res.validate() 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server_group.py", line 69, in validate 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack MICROVERSION_SOFT_POLICIES) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 107, in is_version_supported 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack self.get_max_microversion()) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 236, in get_api_version 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack api_version = APIVersion(version_string) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 77, in __init__ 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack raise exceptions.UnsupportedVersion(msg) 2020-07-24 08:33:10.618 2167043 ERROR heat.engine.stack UnsupportedVersion: Invalid format of client version ''. Expected format 'X.Y', where X is a major part and Y is a minor part of ve rsion. So it looks like heat is trying to not specify a microversion which is causing an issue with creating server groups. I have tried and I am able to create a server group and assign members with the cli. Regards Alexander Dibbo – Cloud Architect / Cloud Operations Group Leader For STFC Cloud Documentation visit https://stfc-cloud-docs.readthedocs.io To raise a support ticket with the cloud team please email cloud-support at gridpp.rl.ac.uk To receive notifications about the service please subscribe to our mailing list at:https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=STFC-CLOUD To receive fast notifications or to discuss usage of the cloud please join our Slack:https://stfc-cloud.slack.com/ This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKRI business are solely those of the author and do not represent the views of UKRI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Fri Jul 24 14:59:21 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Fri, 24 Jul 2020 10:59:21 -0400 Subject: [Ocata][Heat] Strange error returned after stack creation failure -r aw template with id xxx not found In-Reply-To: <7fe6626a-0abb-97ca-fbfb-2066f426b9bf@redhat.com> References: <7fe6626a-0abb-97ca-fbfb-2066f426b9bf@redhat.com> Message-ID: Hey Zane, Thank you so much for the details - super interesting. We've worked with the Vendor to try and reproduce while we had our logs for Heat turned to DEBUG. Unfortunately, all of the creations they have attempted since have worked. It first failed 4 times out of 5 and has since worked... It's one of those problems! We'll keep trying to reproduce. Just to be sure, the actual yaml is stored in the DB and then accessed to create the actual Heat ressources? Thanks! On Wed, Jul 22, 2020 at 3:46 PM Zane Bitter wrote: > On 21/07/20 8:03 pm, Laurent Dumont wrote: > > Hi! > > > > We are currently troubleshooting a Heat stack issue where one of the > > stack (one of 25 or so) is failing to be created properly (seemingly > > randomly). > > > > The actual error returned by Heat is quite strange and Google has been > > quite sparse in terms of references. > > > > The actual error looks like the following (I've sanitized some of the > > names): > > > > Resource CREATE failed: resources.potato: Resource CREATE failed: > > resources[0]: raw template with id 22273 not found > > When creating a nested stack, rather than just calling the RPC method to > create a new stack, Heat stores the template in the database first and > passes the ID in the RPC message.[1] (It turns out that by doing it this > way we can save massive amounts of memory when processing a large tree > of nested stacks.) My best guess is that this message indicates that the > template row has been deleted by the time the other engine goes to look > at it. > > I don't see how you could have got an ID like 22273 without the template > having been successfully stored at some point. > > The template is only supposed to be deleted if the RPC call returns with > an error.[2] The only way I can think of for that to happen before an > attempt to create the child stack is if the RPC call times out, but the > original message is eventually picked up by an engine. I would check > your logs for RPC timeouts and consider increasing them. > > What does the status_reason look like at one level above in the tree? > That should indicate the first error that caused the template to be > deleted. > > > heat resource-list STACK_NAME_HERE -n 50 > > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > | resource_name | physical_resource_id | > > resource_type | resource_status | updated_time | > > stack_name > > | > > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > | potato | RESOURCE_ID_HERE | OS::Heat::ResourceGroup | > > CREATE_FAILED | 2020-07-18 T19:52:10Z | > > nested_stack_1_STACK_NAME_HERE | > > | potato_server_group | RESOURCE_ID_HERE | OS::Nova::ServerGroup | > > CREATE_COMPLETE | 2020-07-21T19:52:10Z | > > nested_stack_1_STACK_NAME_HERE | > > | 0 | | > > potato1.yaml | CREATE_FAILED | 2020-07-18T19:52:12Z | > > nested_stack_2_STACK_NAME_HERE | > > | 1 | | > > potato1.yaml | INIT_COMPLETE | 2020-07- 18 T19:52:12Z | > > nested_stack_2_STACK_NAME_HERE | > > > +------------------+--------------------------------------+-------------------------+-----------------+----------------------+--------------------------------------------------------------------------+ > > > > > > The template itself is pretty simple and attempts to create a > > ServerGroup and 2 VMs (as part of the ResourceGroup). My feeling is that > > one the creation of those machines fails and Heat get's a little cooky > > and returns an error that might not be the actual root cause. I would > > have expected the VM to show up in the resource list but I just see the > > source "yaml". > > It's clear from the above output that the scaled unit of the resource > group is in fact a template (not an OS::Nova::Server), and the error is > occurring trying to create a stack from that template (potato1.yaml) - > before Heat even has a chance to start creating the server. > > > Has anyone seen something similar in the past? > > Nope. > > cheers, > Zane. > > [1] > > https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L367-L384 > [2] > > https://opendev.org/openstack/heat/src/branch/master/heat/engine/resources/stack_resource.py#L335-L342 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Jul 24 18:30:16 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 24 Jul 2020 19:30:16 +0100 Subject: [kolla] Kolla klub meeting next week Message-ID: Hi, The next Kolla klub meeting is scheduled for Thursday at 15:00 UTC. I propose the following topic: Downstream fork amnesty: are you running with locally applied bug fix or feature patches? Would they be useful to the rest of the community? Let's have an open discussion about the useful code that we all might be sitting on, and discuss how we can share it for the benefit of the wider community. There is room for other topics if anyone would like to propose them. We previously postponed these topics: * Gaël: Offline usage of Kolla & Kolla Ansible * Justinas: host OS lifecycle management Can we cover them this week? https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw/edit# Thanks, Mark From melwittt at gmail.com Fri Jul 24 19:51:46 2020 From: melwittt at gmail.com (melanie witt) Date: Fri, 24 Jul 2020 12:51:46 -0700 Subject: [nova][gate] nova-ceph-multistore job failing with NoValidHost Message-ID: Hey all, The nova-ceph-multistore job (devstack-plugin-ceph-tempest-py3 + tweaks to make it run with multiple glance stores) is failing at around a 80% rate as of today. We are tracking the work in this bug: https://bugs.launchpad.net/devstack-plugin-ceph/+bug/1888895 The TL;DR on this is that the ceph bluestore backend when backed by a file will create the file if it doesn't already exist and will create it with a default size. Prior to today, we were pulling ceph version 14.2.10 which defaults the file size to 100G. Then today, we started pulling ceph version 14.2.2 which defaults the file size to 10G which isn't enough space and we're getting NoValidHost with no allocation candidates being returned from placement. We don't know yet what caused us to start pulling an older version tag for ceph. We are currently trying out a WIP fix in the devstack-plugin-ceph repo to configure the bluestore_block_file_size to a reasonable value instead of relying on the default: https://review.opendev.org/742961 We'll keep you updated on the progress as we work on it. Cheers, -melanie From melwittt at gmail.com Fri Jul 24 20:17:39 2020 From: melwittt at gmail.com (melanie witt) Date: Fri, 24 Jul 2020 13:17:39 -0700 Subject: [all][gate] ceph jobs failing with NoValidHost In-Reply-To: References: Message-ID: On 7/24/20 12:51, melanie witt wrote: > Hey all, > > The nova-ceph-multistore job (devstack-plugin-ceph-tempest-py3 + tweaks > to make it run with multiple glance stores) is failing at around a 80% > rate as of today. We are tracking the work in this bug: > > https://bugs.launchpad.net/devstack-plugin-ceph/+bug/1888895 > > The TL;DR on this is that the ceph bluestore backend when backed by a > file will create the file if it doesn't already exist and will create it > with a default size. Prior to today, we were pulling ceph version > 14.2.10 which defaults the file size to 100G. Then today, we started > pulling ceph version 14.2.2 which defaults the file size to 10G which > isn't enough space and we're getting NoValidHost with no allocation > candidates being returned from placement. > > We don't know yet what caused us to start pulling an older version tag > for ceph. > > We are currently trying out a WIP fix in the devstack-plugin-ceph repo > to configure the bluestore_block_file_size to a reasonable value instead s/bluestore_block_file_size/bluestore_block_size/ > of relying on the default: > > https://review.opendev.org/742961 > > We'll keep you updated on the progress as we work on it. Updating this to [all][gate] because it appears it's not only the nova-ceph-multistore job that's affected but all devstack-plugin-ceph-tempest-py3 jobs. We found NoValidHost failures on patches proposed to openstack/glance and openstack/tempest as well. The fix for all should be the same (patch in devstack-plugin-ceph) so once we get that working well, the ceph jobs should be fixed for [all][gate]. -melanie From peljasz at yahoo.co.uk Sat Jul 25 09:07:23 2020 From: peljasz at yahoo.co.uk (lejeczek) Date: Sat, 25 Jul 2020 10:07:23 +0100 Subject: floating IP - HA, kind of - how ? In-Reply-To: <1595530904689.76061@univ-lyon1.fr> References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk> <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> <1595530904689.76061@univ-lyon1.fr> Message-ID: <035f6644-4985-e879-e176-96b865cf10af@yahoo.co.uk> On 23/07/2020 20:01, CHANU ROMAIN wrote: > Hello, > > You should look into Octavia project: LoadBalancer as a Service. > > Best Regards, > Romain > ________________________________________ > From: lejeczek > Sent: Thursday, July 23, 2020 8:51 PM > To: OpenStack Discuss > Subject: floating IP - HA, kind of - how ? > > hi guys, > > A novice here so go easy on me please. > I wonder - is there a mechanism in openstack, a built-in > feature where a floating IP could be juggled between > guests/instances dynamically - would you know? > What comes to mind is something like HA/pacemaker, something > where business logic operates around condition and actions. > I cannot make is simpler than such an example - instance_A > has floating_IPa but if something "bad" happens to it then > floating_IPa moves to instance_B - can something like that > be handled by openstack's tooling or goes outside its realm > and can only be worked out however anybody would do it > individually? > > many thanks, L. Would what you suggest be achievable, doable by a non-admin? -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1757 bytes Desc: not available URL: From zigo at debian.org Sun Jul 26 10:51:32 2020 From: zigo at debian.org (Thomas Goirand) Date: Sun, 26 Jul 2020 12:51:32 +0200 Subject: floating IP - HA, kind of - how ? In-Reply-To: <035f6644-4985-e879-e176-96b865cf10af@yahoo.co.uk> References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk> <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> <1595530904689.76061@univ-lyon1.fr> <035f6644-4985-e879-e176-96b865cf10af@yahoo.co.uk> Message-ID: On 7/25/20 11:07 AM, lejeczek wrote: > > > On 23/07/2020 20:01, CHANU ROMAIN wrote: >> Hello, >> >> You should look into Octavia project: LoadBalancer as a Service. >> >> Best Regards, >> Romain >> ________________________________________ >> From: lejeczek >> Sent: Thursday, July 23, 2020 8:51 PM >> To: OpenStack Discuss >> Subject: floating IP - HA, kind of - how ? >> >> hi guys, >> >> A novice here so go easy on me please. >> I wonder - is there a mechanism in openstack, a built-in >> feature where a floating IP could be juggled between >> guests/instances dynamically - would you know? >> What comes to mind is something like HA/pacemaker, something >> where business logic operates around condition and actions. >> I cannot make is simpler than such an example - instance_A >> has floating_IPa but if something "bad" happens to it then >> floating_IPa moves to instance_B - can something like that >> be handled by openstack's tooling or goes outside its realm >> and can only be worked out however anybody would do it >> individually? >> >> many thanks, L. > Would what you suggest be achievable, doable by a non-admin? As long as Octavia is installed, yes. If not, then you can achieve what Octavia does using VRRP ports sharing a floating IP (which you wouldn't assign, just reserve). That's in fact more or less what Octavia does. Cheers, Thomas Goirand (zigo) From zigo at debian.org Sun Jul 26 15:43:50 2020 From: zigo at debian.org (Thomas Goirand) Date: Sun, 26 Jul 2020 17:43:50 +0200 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: References: Message-ID: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> On 7/23/20 6:30 PM, OpenStack Foundation wrote: > *STEP TWO* > *Submit your talks !*  > The Call For > Presentations > (CFP) deadline is August > 4 at 11:59pm PT, so start drafting your presentations and panels around > Open Infrastructure use cases like AI/Machine Learning, CI/CD, Container > Infrastructure, Edge Computing and of course, Public, Private and Hybrid > Clouds. I can't edit my BIO, as it says the email is invalid, however, I cannot edit it (the field is disabled). Who's in charge of cfp.openstack.org ? Cheers, Thomas Goirand (zigo) From jimmy at openstack.org Sun Jul 26 19:56:29 2020 From: jimmy at openstack.org (Jimmy McArthur) Date: Sun, 26 Jul 2020 14:56:29 -0500 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> Message-ID: Hey Thomas, The controls to edit basic contact info have actually moved to OpenStackID (https://openstackid.org/accounts/user/profile).  There are still additional fields on https://www.openstack.org/profile/ and https://www.openstack.org/profile/speaker, but since OpenStackID also works with a number of other services, we moved those common fields to OpenStackID account management. Please let me know if you have additional questions.  And keep in mind, if you update your email address, it will also change your OpenStackID login. Cheers and let me know if you have additional questions, Jimmy Thomas Goirand wrote on 7/26/20 10:43 AM: > On 7/23/20 6:30 PM, OpenStack Foundation wrote: >> *STEP TWO* >> *Submit your talks !* >> The Call For >> Presentations >> (CFP) deadline is August >> 4 at 11:59pm PT, so start drafting your presentations and panels around >> Open Infrastructure use cases like AI/Machine Learning, CI/CD, Container >> Infrastructure, Edge Computing and of course, Public, Private and Hybrid >> Clouds. > I can't edit my BIO, as it says the email is invalid, however, I cannot > edit it (the field is disabled). Who's in charge of cfp.openstack.org ? > > Cheers, > > Thomas Goirand (zigo) > From josephine.seifert at secustack.com Mon Jul 27 07:05:33 2020 From: josephine.seifert at secustack.com (Josephine Seifert) Date: Mon, 27 Jul 2020 09:05:33 +0200 Subject: [Image-Encryption] No meeting today Message-ID: <954dfb53-2fdd-46ba-98e1-7a025b58ad16@secustack.com> Hi, there won't be a popup team meeting today. We will resume next week. greetings Josephine (Luzi) From zigo at debian.org Mon Jul 27 07:08:35 2020 From: zigo at debian.org (Thomas Goirand) Date: Mon, 27 Jul 2020 09:08:35 +0200 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> Message-ID: <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> On 7/26/20 9:56 PM, Jimmy McArthur wrote: > Hey Thomas, > > The controls to edit basic contact info have actually moved to > OpenStackID (https://openstackid.org/accounts/user/profile).  There are > still additional fields on https://www.openstack.org/profile/ and > https://www.openstack.org/profile/speaker, but since OpenStackID also > works with a number of other services, we moved those common fields to > OpenStackID account management. > > Please let me know if you have additional questions.  And keep in mind, > if you update your email address, it will also change your OpenStackID > login. > > Cheers and let me know if you have additional questions, > Jimmy Hi Jimmy, THanks for your quick answer. Well, there's a bug then... When I got to: https://cfp.openstack.org/app/profile under the Email field, it displays tho%2A%2A%40goirand.fr which I cannot edit. Then when I click on SAVE, I'm being told that the email isn't a valid one (but I cannot edit it...). As a result, I can never save my updated bio... Cheers, Thomas Goirand (zigo) From skaplons at redhat.com Mon Jul 27 07:56:22 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 27 Jul 2020 09:56:22 +0200 Subject: floating IP - HA, kind of - how ? In-Reply-To: References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk> <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> <1595530904689.76061@univ-lyon1.fr> <035f6644-4985-e879-e176-96b865cf10af@yahoo.co.uk> Message-ID: <67EAA3A2-A759-4F82-8B2F-3904AE9EC1CB@redhat.com> Hi, > On 26 Jul 2020, at 12:51, Thomas Goirand wrote: > > On 7/25/20 11:07 AM, lejeczek wrote: >> >> >> On 23/07/2020 20:01, CHANU ROMAIN wrote: >>> Hello, >>> >>> You should look into Octavia project: LoadBalancer as a Service. >>> >>> Best Regards, >>> Romain >>> ________________________________________ >>> From: lejeczek >>> Sent: Thursday, July 23, 2020 8:51 PM >>> To: OpenStack Discuss >>> Subject: floating IP - HA, kind of - how ? >>> >>> hi guys, >>> >>> A novice here so go easy on me please. >>> I wonder - is there a mechanism in openstack, a built-in >>> feature where a floating IP could be juggled between >>> guests/instances dynamically - would you know? >>> What comes to mind is something like HA/pacemaker, something >>> where business logic operates around condition and actions. >>> I cannot make is simpler than such an example - instance_A >>> has floating_IPa but if something "bad" happens to it then >>> floating_IPa moves to instance_B - can something like that >>> be handled by openstack's tooling or goes outside its realm >>> and can only be worked out however anybody would do it >>> individually? >>> >>> many thanks, L. >> Would what you suggest be achievable, doable by a non-admin? > > As long as Octavia is installed, yes. If not, then you can achieve what > Octavia does using VRRP ports sharing a floating IP (which you wouldn't > assign, just reserve). That's in fact more or less what Octavia does. IIRC Octavia is not using Neutron’ Floating IP but it creates additional port with fixed IP address and uses this IP to move it between keepalived instances. If You want to do that, please remember to add this additional IP address to allowed address pair in ports connected to Your VMs. Please also be aware of bug [1] which causes some problems with such setup when DVR is used. > > Cheers, > > Thomas Goirand (zigo) [1] https://bugs.launchpad.net/neutron/+bug/1774459 — Slawek Kaplonski Principal software engineer Red Hat From arnaud.morin at gmail.com Mon Jul 27 08:52:51 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 27 Jul 2020 08:52:51 +0000 Subject: [ops] Reviving OSOps ? In-Reply-To: <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> Message-ID: <20200727085251.GJ31915@sync> +1 for reviving OSops. Within OVH, we choose OpenStack for its community and openness, so we will try to participate in such effort! Cheers, -- Arnaud Morin On 17.07.20 - 15:19, Thierry Carrez wrote: > Hi everyone, > > During the last Opendev event we discussed reviving the OSops[1] idea: a > lightweight area where operators can share the various small tools that they > end up creating to help them operate OpenStack deployments. The effort has > been mostly dormant for a few years. > > We had a recent thread[2] about osarchiver, a new operators helper, and > whether it would make sense to push it upstream. I think the best option > would be to revive OSops and land it there. > > Who is interested in helping to revive/maintain this ? > > If we revive it, I think we should move its repositories away from the > catch-all "x" directory under opendev, which was created for projects that > were not claimed by anyone during the big migration. > > If Osops should be considered distinct from OpenStack, then I'd recommend > giving it its own opendev top directory, and move existing x/osops-* > repositories to osops/*. > > If we'd like to make OSops a product of the OpenStack community (and have > contributions to it be fully recognized as contributions to "OpenStack"), > then I'd recommend creating a specific SIG dedicated to this, and move the > x/osops-* repositories to openstack/osops-*. > > [1] https://wiki.openstack.org/wiki/Osops > [2] > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015977.html > > -- > Thierry Carrez (ttx) > From arnaud.morin at gmail.com Mon Jul 27 09:57:44 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 27 Jul 2020 09:57:44 +0000 Subject: [largescale-sig] RPC ping Message-ID: <20200727095744.GK31915@sync> Hey all, TLDR: I propose a change to oslo_messaging to allow doing a ping over RPC, this is useful to monitor liveness of agents. Few weeks ago, I proposed a patch to oslo_messaging [1], which is adding a ping endpoint to RPC dispatcher. It means that every openstack service which is using oslo_messaging RPC endpoints (almosts all OpenStack services and agents - e.g. neutron server + agents, nova + computes, etc.) will then be able to answer to a specific "ping" call over RPC. I decided to propose this patch in my company mainly for 2 reasons: 1 - we are struggling monitoring our nova compute and neutron agents in a correct way: 1.1 - sometimes our agents are disconnected from RPC, but the python process is still running. 1.2 - sometimes the agent is still connected, but the queue / binding on rabbit cluster is not working anymore (after a rabbit split for example). This one is very hard to debug, because the agent is still reporting health correctly on neutron server, but it's not able to receive messages anymore. 2 - we are trying to monitor agents running in k8s pods: when running a python agent (neutron l3-agent for example) in a k8s pod, we wanted to find a way to monitor if it is still live of not. Adding a RPC ping endpoint could help us solve both these issues. Note that we still need an external mechanism (out of OpenStack) to do this ping. We also think it could be nice for other OpenStackers, and especially large scale ops. Feel free to comment. [1] https://review.opendev.org/#/c/735385/ -- Arnaud Morin From kendall at openstack.org Mon Jul 27 15:02:10 2020 From: kendall at openstack.org (Kendall Waters) Date: Mon, 27 Jul 2020 10:02:10 -0500 Subject: October/November PTG Date Selection Message-ID: Hello Everyone! We wanted to get your input on the best dates for holding the PTG now that its been annouced that the Summit itself will be virtual[1]. Please fill out the CIVS poll by August 3 at 15:00 UTC. We really appreicate your feedback! Poll: https://civs.cs.cornell.edu/cgi-bin/vote.pl?id=E_a69944444a9b7c93&akey=3eccaa5b4fa5cfe6 There are obviously many pros and cons to each option, below are the conflicts/reasons why we might not want to select that option. Option 0 - Two weeks before Summit (October 5-9) Conflicts with RC work External Event Conflicts: None? Option 1 - One week after Summit (October 26-30) Halloween on Oct 31 Perfect for release timing but event fatigue is a real concern External Event Conflicts: Open Source Summit EU, ODSC Open Data Science Conference East Cloud Foundry Summit EU, KVM Forum Option 2 - Two weeks after the Summit (November 2-6) USA Election day on Nov 3; could be a distraction, contributors needing to go vote (whatever that looks like with the virus), news, etc. Event Conflicts: None? Option 3 - Three weeks after the Summit (November 9-13) Pushes far into the cycle (4 weeks after previous release, a bit late, Veterns day/holiday - may affect France and Germany and US) External Events Conflicts: VMWorld EU -The Kendalls (diablo_rojo & wendallkaters) [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016068.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Mon Jul 27 15:10:06 2020 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Mon, 27 Jul 2020 17:10:06 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: Arnaud, thanks for sharing this tool. On Thu, Jul 16, 2020 at 3:48 PM Arnaud Morin wrote: > Hello large-scalers! > > TLDR: we opensource a tool to help reducing size of databases. > See https://github.com/ovh/osarchiver/ > > > Few months ago, we released a tool, name osarchiver, which we are using > on our production environment (at OVH) to help reduce the size of our > tables in mariadb (or mysql) > > In fact, some tables are well know to grow very quickly. > > We use it, for example, to clean the OpenStack mistral database from old > tasks, actions and executions which are older than a year. > > Another use case could be to archive some data in another table (e.g. with > _archived as suffix) if they are 6 months old, and delete this data after > 1 year. > > The source code of this tool is available here: > https://github.com/ovh/osarchiver/ > > We were wondering if some other users would be interested in using the > tool, and maybe move it under the opendev governance? > > Feel free to contact us and/or answer this thread. > > Cheers, > > -- > Arnaud, Pierre-Samuel and OVH team > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Mon Jul 27 07:24:40 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Mon, 27 Jul 2020 15:24:40 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200721005113.GA10502@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> Message-ID: <20200727072440.GA28676@joy-OptiPlex-7040> > > As you indicate, the vendor driver is responsible for checking version > > information embedded within the migration stream. Therefore a > > migration should fail early if the devices are incompatible. Is it > but as I know, currently in VFIO migration protocol, we have no way to > get vendor specific compatibility checking string in migration setup stage > (i.e. .save_setup stage) before the device is set to _SAVING state. > In this way, for devices who does not save device data in precopy stage, > the migration compatibility checking is as late as in stop-and-copy > stage, which is too late. > do you think we need to add the getting/checking of vendor specific > compatibility string early in save_setup stage? > hi Alex, after an offline discussion with Kevin, I realized that it may not be a problem if migration compatibility check in vendor driver occurs late in stop-and-copy phase for some devices, because if we report device compatibility attributes clearly in an interface, the chances for libvirt/openstack to make a wrong decision is little. so, do you think we are now arriving at an agreement that we'll give up the read-and-test scheme and start to defining one interface (perhaps in json format), from which libvirt/openstack is able to parse and find out compatibility list of a source mdev/physical device? Thanks Yan From jimmy at openstack.org Mon Jul 27 15:54:40 2020 From: jimmy at openstack.org (Jimmy McArthur) Date: Mon, 27 Jul 2020 10:54:40 -0500 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> Message-ID: <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> That does indeed sound like a bug.  Let me test with your account and I'll update ASAP. Cheers, Jimmy Thomas Goirand wrote on 7/27/20 2:08 AM: > Well, there's a bug then... > > When I got to: > https://cfp.openstack.org/app/profile > > under the Email field, it displays tho%2A%2A%40goirand.fr which I cannot > edit. Then when I click on SAVE, I'm being told that the email isn't a > valid one (but I cannot edit it...). > > As a result, I can never save my updated bio... From iurygregory at gmail.com Mon Jul 27 15:55:16 2020 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 27 Jul 2020 17:55:16 +0200 Subject: [ironic] let's talk about grenade Message-ID: Hello everyone, I'm still on the fight to move our ironic-grenade-dsvm-multinode-multitenant to zuulv3 [1], you can find some of my findings on the etherpad [2] under `Move to Zuul v3 Jobs (Iurygregory)`. If you are interested in helping out we are going to schedule a meeting to discuss about this, please use the doodle in [3]. I will close the doodle on Wed July 29. Thanks! [1] https://review.opendev.org/705030 [2] https://etherpad.openstack.org/p/IronicWhiteBoard [3] https://doodle.com/poll/m69b5zwnsbgcysct -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Mon Jul 27 16:05:48 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 27 Jul 2020 09:05:48 -0700 Subject: floating IP - HA, kind of - how ? In-Reply-To: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> References: <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13.ref@yahoo.co.uk> <0a80ea82-3b9c-07ed-8ffa-0fc1bf514d13@yahoo.co.uk> Message-ID: Hi lejeczek, What you are describing is what a load balancer provides. The load balancing project for OpenStack is Octavia. We have a cookbook that can guide you through setting up a load balancer: https://docs.openstack.org/octavia/latest/user/guides/basic-cookbook.html We also gave an OpenStack summit presentation that is an introduction to load balancing: https://youtu.be/BBgP3_qhJ00 Load balancers provide the health monitoring and automatic redirection of network traffic. They can also provide this failover much faster than reconfiguring a floating IP. Michael On Thu, Jul 23, 2020 at 11:56 AM lejeczek wrote: > > hi guys, > > A novice here so go easy on me please. > I wonder - is there a mechanism in openstack, a built-in > feature where a floating IP could be juggled between > guests/instances dynamically - would you know? > What comes to mind is something like HA/pacemaker, something > where business logic operates around condition and actions. > I cannot make is simpler than such an example - instance_A > has floating_IPa but if something "bad" happens to it then > floating_IPa moves to instance_B - can something like that > be handled by openstack's tooling or goes outside its realm > and can only be worked out however anybody would do it > individually? > > many thanks, L. From melwittt at gmail.com Mon Jul 27 16:31:14 2020 From: melwittt at gmail.com (melanie witt) Date: Mon, 27 Jul 2020 09:31:14 -0700 Subject: [all][gate] ceph jobs failing with NoValidHost In-Reply-To: References: Message-ID: On 7/24/20 13:17, melanie witt wrote: >> We'll keep you updated on the progress as we work on it. > > Updating this to [all][gate] because it appears it's not only the > nova-ceph-multistore job that's affected but all > devstack-plugin-ceph-tempest-py3 jobs. We found NoValidHost failures on > patches proposed to openstack/glance and openstack/tempest as well. > > The fix for all should be the same (patch in devstack-plugin-ceph) so > once we get that working well, the ceph jobs should be fixed for > [all][gate]. The fix https://review.opendev.org/742961 has merged, so the ceph jobs should be back to normal again. Cheers, -melanie From openstack at nemebean.com Mon Jul 27 16:41:22 2020 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 27 Jul 2020 11:41:22 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <20200727095744.GK31915@sync> References: <20200727095744.GK31915@sync> Message-ID: <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Tagging with Nova and Neutron as they are mentioned and I thought some people from those teams had opinions on this. Can you refresh my memory on why we dropped this before? I recall talking about it in Denver, but I can't for the life of me remember what the conclusion was. Did we intend to use something else for this that has since fallen through? On 7/27/20 4:57 AM, Arnaud Morin wrote: > Hey all, > > TLDR: I propose a change to oslo_messaging to allow doing a ping over RPC, > this is useful to monitor liveness of agents. > > > Few weeks ago, I proposed a patch to oslo_messaging [1], which is adding a > ping endpoint to RPC dispatcher. > It means that every openstack service which is using oslo_messaging RPC > endpoints (almosts all OpenStack services and agents - e.g. neutron > server + agents, nova + computes, etc.) will then be able to answer to a > specific "ping" call over RPC. > > I decided to propose this patch in my company mainly for 2 reasons: > 1 - we are struggling monitoring our nova compute and neutron agents in a > correct way: > > 1.1 - sometimes our agents are disconnected from RPC, but the python process > is still running. > 1.2 - sometimes the agent is still connected, but the queue / binding on > rabbit cluster is not working anymore (after a rabbit split for > example). This one is very hard to debug, because the agent is still > reporting health correctly on neutron server, but it's not able to > receive messages anymore. > > > 2 - we are trying to monitor agents running in k8s pods: > when running a python agent (neutron l3-agent for example) in a k8s pod, we > wanted to find a way to monitor if it is still live of not. > > > Adding a RPC ping endpoint could help us solve both these issues. > Note that we still need an external mechanism (out of OpenStack) to do this > ping. > We also think it could be nice for other OpenStackers, and especially > large scale ops. > > Feel free to comment. > > > [1] https://review.opendev.org/#/c/735385/ > > From dms at danplanet.com Mon Jul 27 17:08:35 2020 From: dms at danplanet.com (Dan Smith) Date: Mon, 27 Jul 2020 10:08:35 -0700 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> (Ben Nemec's message of "Mon, 27 Jul 2020 11:41:22 -0500") References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Message-ID: > Tagging with Nova and Neutron as they are mentioned and I thought some > people from those teams had opinions on this. Nova already implements ping() on the compute RPC interface, which we use to make sure compute waits to start up until conductor is available to do its bidding. So if a new obligatory RPC server method is actually added called ping(), it will break us. > Can you refresh my memory on why we dropped this before? I recall > talking about it in Denver, but I can't for the life of me remember > what the conclusion was. Did we intend to use something else for this > that has since fallen through? The prior conversation I recall was about helm sitting on our bus to (ab)use our ping method for health checks: https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa I believe that has since been reverted. The primary concern was about something other than nova sitting on our bus making calls to our internal services. I imagine that the proposal to bake it into oslo.messaging is for the same purpose, and I'd probably have the same concern. At the time I think we agreed that if we were going to support direct-to-service health checks, they should be teensy HTTP servers with oslo healthchecks middleware. Further loading down rabbit with those pings doesn't seem like the best plan to me. Especially since Nova (compute) services already check in over RPC periodically and the success of that is discoverable en masse through the API. --Dan From openstack at nemebean.com Mon Jul 27 17:38:42 2020 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 27 Jul 2020 12:38:42 -0500 Subject: [ops] Reviving OSOps ? In-Reply-To: <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> Message-ID: <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> On 7/17/20 8:19 AM, Thierry Carrez wrote: > If Osops should be considered distinct from OpenStack That feels like the wrong statement to make, even if only implicitly by repo organization. Is there a compelling reason not to have osops under the openstack namespace? From sean.mcginnis at gmx.com Mon Jul 27 17:58:30 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 27 Jul 2020 12:58:30 -0500 Subject: [ops] Reviving OSOps ? In-Reply-To: <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> Message-ID: <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> >> If Osops should be considered distinct from OpenStack > > That feels like the wrong statement to make, even if only implicitly > by repo organization. Is there a compelling reason not to have osops > under the openstack namespace? > I think it makes the most sense to be under the openstack namespace. We have the Operations Docs SIG right now that took on some of the operator-specific documentation that no longer had a home. This was a consistent issue brought up in the Ops Meetup events. While not "wildly successful" in getting a bunch of new and updated docs, it at least has accomplished the main goal of getting these docs published to docs.openstack.org again, and providing a place where more collaboration can (and occasionally does) happen to improve those docs. I think we could probably expand the scope of this SIG. Especially considering it is a pretty low-volume SIG anyway. I would be good with changing this to something like the "Operator Docs and Tooling SIG" and getting any of these useful tooling repos under governance through that. I personally wouldn't be able to spend a lot of time working on anything under the SIG, but I'd be happy to keep an eye out for any new reviews and help get those through. Sean From mark at stackhpc.com Mon Jul 27 18:06:24 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 27 Jul 2020 19:06:24 +0100 Subject: London Openinfra virtual meetup Message-ID: Hi, The London Openinfra meetup group [1] are hosting a virtual meetup [2] this Thursday (30th July) at 17:00 UTC. Owing to the magic of video streaming [3], attendees do not need to be physically present in London to watch! Here's the schedule: 6:00pm to 6:30pm - Introductions, OpenStack 10th anniversary retrospective 6:30pm to 7:15pm - Edge ecosystem, use cases and architectures (Ildikó Vancsa, OpenStack Foundation and Gergely Csatari, Nokia) 7:20pm to 8:05pm - Kayobe & Kolla - sane OpenStack deployment (Mark Goddard, StackHPC 8:15pm to 9:00pm - TBC I can guarantee that Ildikó and Gergely's talk will be great. As for the other one, you'll have to judge for yourselves ;) Cheers, Mark [1] https://www.meetup.com/OpenInfra-London/ [2] https://www.meetup.com/OpenInfra-London/events/272083028/ [3] https://www.youtube.com/watch?v=0liqSO0SZ60 From mark at stackhpc.com Mon Jul 27 18:07:29 2020 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 27 Jul 2020 19:07:29 +0100 Subject: London Openinfra virtual meetup In-Reply-To: References: Message-ID: On Mon, 27 Jul 2020 at 19:06, Mark Goddard wrote: > > Hi, > > The London Openinfra meetup group [1] are hosting a virtual meetup [2] > this Thursday (30th July) at 17:00 UTC. Owing to the magic of video > streaming [3], attendees do not need to be physically present in > London to watch! > > Here's the schedule: > > 6:00pm to 6:30pm - Introductions, OpenStack 10th anniversary retrospective > 6:30pm to 7:15pm - Edge ecosystem, use cases and architectures (Ildikó > Vancsa, OpenStack Foundation and Gergely Csatari, Nokia) > 7:20pm to 8:05pm - Kayobe & Kolla - sane OpenStack deployment (Mark > Goddard, StackHPC > 8:15pm to 9:00pm - TBC These times are BST, which is UTC+1. > > I can guarantee that Ildikó and Gergely's talk will be great. As for > the other one, you'll have to judge for yourselves ;) > > Cheers, > Mark > > [1] https://www.meetup.com/OpenInfra-London/ > [2] https://www.meetup.com/OpenInfra-London/events/272083028/ > [3] https://www.youtube.com/watch?v=0liqSO0SZ60 From amy at demarco.com Mon Jul 27 18:44:30 2020 From: amy at demarco.com (Amy Marrich) Date: Mon, 27 Jul 2020 13:44:30 -0500 Subject: [ops] Reviving OSOps ? In-Reply-To: <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> Message-ID: +1 on combining this in with the existing SiG and efforts. Amy (spotz) On Mon, Jul 27, 2020 at 1:02 PM Sean McGinnis wrote: > > >> If Osops should be considered distinct from OpenStack > > > > That feels like the wrong statement to make, even if only implicitly > > by repo organization. Is there a compelling reason not to have osops > > under the openstack namespace? > > > I think it makes the most sense to be under the openstack namespace. > > We have the Operations Docs SIG right now that took on some of the > operator-specific documentation that no longer had a home. This was a > consistent issue brought up in the Ops Meetup events. While not "wildly > successful" in getting a bunch of new and updated docs, it at least has > accomplished the main goal of getting these docs published to > docs.openstack.org again, and providing a place where more collaboration > can (and occasionally does) happen to improve those docs. > > I think we could probably expand the scope of this SIG. Especially > considering it is a pretty low-volume SIG anyway. I would be good with > changing this to something like the "Operator Docs and Tooling SIG" and > getting any of these useful tooling repos under governance through that. > I personally wouldn't be able to spend a lot of time working on anything > under the SIG, but I'd be happy to keep an eye out for any new reviews > and help get those through. > > Sean > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Mon Jul 27 19:28:58 2020 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 27 Jul 2020 15:28:58 -0400 Subject: Nate bug deputy notes - 2020-07-20 to 2020-07-27 Message-ID: <20200727192858.wxwova565mxgiw6l@firewall> All, Here are my notes from this week as bug deputy. There are two issues that, because of IPv6 defects in my home lab, I was not able to confirm. Other than these everything has an assignee and/or a fix being worked on. Untriaged: - "Neutron start radvd and mess up the routing table when: ipv6_ra_mode=not set ipv6-address-mode=slaac" - URL: https://bugs.launchpad.net/bugs/1888256 - Version: stable/rocky - Tags: ipv6 - Assignee: none - "IPv6 PD with DVR does not assign correct snat sg address" - URL: https://bugs.launchpad.net/bugs/1888464 - Version: stable/stein - Tags: ipv6 - Assignee: none Critical: - "[OVN]: creating a local switch port that has no tags should not fail" - URL: https://bugs.launchpad.net/bugs/1888736 - Assignee: ffernand - Fix: https://review.opendev.org/742758 merged High: - "[FT] neutron.tests.functional.agent.linux.test_iptables.IptablesManagerNonRootTestCase test cases always failing" - URL: https://bugs.launchpad.net/bugs/1888213 - Tags: gate-failure - Assignee: ralonsoh - Fix: https://review.opendev.org/741957 released - '[neutron-tempest-plugin] greendns query has no attribute "_compute_expiration"' - URL: https://bugs.launchpad.net/bugs/1888258 - Fix: https://review.opendev.org/741986 released - "[OVN Octavia Provider] octavia_tempest_plugin.tests.api.v2.test_member.MemberAPITest.test_member_batch_update fails" - URL: https://bugs.launchpad.net/bugs/1888489 - Tags: ovn-octavia-provider - Assignee: maciej.jozefczyk - Fix: https://review.opendev.org/742410 - "[OVN Octavia Provider] octavia_tempest_plugin.tests.api.v2.test_pool.PoolAPITest.test_pool_create_with_listener fails" - URL: https://bugs.launchpad.net/neutron/+bug/1888646 - Tags: ovn-octavia-provider - Assignee: maciej.jozefczyk - "OVN: Extra DHCP options validation broke the Ironic+OVN+DHCP agent combination" - URL: https://bugs.launchpad.net/bugs/1888649 - Assignee: lucasgomes - Fix: https://review.opendev.org/742654 released - "[FT] TestMaintenance.test_port failing" - URL: https://bugs.launchpad.net/bugs/1888828 - Assignee: ralonsoh - Fix: https://review.opendev.org/742868 Medium: - "neutron-sanity-check fails with NoSuchOptError: no such option vf_management" - URL: https://bugs.launchpad.net/bugs/1888920 - Assignee: ralonsoh - Fix: https://review.opendev.org/743190 Wishlist: - "Change the default value of "propagate_uplink_status" to True" - URL: https://bugs.launchpad.net/bugs/1888487 - "Improve core plugin extension filtering using the mechanism driver information" - URL: https://bugs.launchpad.net/bugs/1888829 Next up is Akihiro. Last week's report by lajoskatona is here: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016002.html Thanks, Nate From nate.johnston at redhat.com Mon Jul 27 19:36:05 2020 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 27 Jul 2020 15:36:05 -0400 Subject: [neutron] Nate bug deputy notes - 2020-07-20 to 2020-07-27 In-Reply-To: <20200727192858.wxwova565mxgiw6l@firewall> References: <20200727192858.wxwova565mxgiw6l@firewall> Message-ID: <20200727193605.s3kyvwq7whtpow6f@firewall> Apologies, resending with "[neutron]" tag in subject. Nate On Mon, Jul 27, 2020 at 03:28:58PM -0400, Nate Johnston wrote: > All, > > Here are my notes from this week as bug deputy. There are two issues that, > because of IPv6 defects in my home lab, I was not able to confirm. Other than > these everything has an assignee and/or a fix being worked on. > > Untriaged: > > - "Neutron start radvd and mess up the routing table when: ipv6_ra_mode=not set ipv6-address-mode=slaac" > - URL: https://bugs.launchpad.net/bugs/1888256 > - Version: stable/rocky > - Tags: ipv6 > - Assignee: none > > - "IPv6 PD with DVR does not assign correct snat sg address" > - URL: https://bugs.launchpad.net/bugs/1888464 > - Version: stable/stein > - Tags: ipv6 > - Assignee: none > > Critical: > > - "[OVN]: creating a local switch port that has no tags should not fail" > - URL: https://bugs.launchpad.net/bugs/1888736 > - Assignee: ffernand > - Fix: https://review.opendev.org/742758 merged > > High: > > - "[FT] neutron.tests.functional.agent.linux.test_iptables.IptablesManagerNonRootTestCase test cases always failing" > - URL: https://bugs.launchpad.net/bugs/1888213 > - Tags: gate-failure > - Assignee: ralonsoh > - Fix: https://review.opendev.org/741957 released > > - '[neutron-tempest-plugin] greendns query has no attribute "_compute_expiration"' > - URL: https://bugs.launchpad.net/bugs/1888258 > - Fix: https://review.opendev.org/741986 released > > - "[OVN Octavia Provider] octavia_tempest_plugin.tests.api.v2.test_member.MemberAPITest.test_member_batch_update fails" > - URL: https://bugs.launchpad.net/bugs/1888489 > - Tags: ovn-octavia-provider > - Assignee: maciej.jozefczyk > - Fix: https://review.opendev.org/742410 > > - "[OVN Octavia Provider] octavia_tempest_plugin.tests.api.v2.test_pool.PoolAPITest.test_pool_create_with_listener fails" > - URL: https://bugs.launchpad.net/neutron/+bug/1888646 > - Tags: ovn-octavia-provider > - Assignee: maciej.jozefczyk > > - "OVN: Extra DHCP options validation broke the Ironic+OVN+DHCP agent combination" > - URL: https://bugs.launchpad.net/bugs/1888649 > - Assignee: lucasgomes > - Fix: https://review.opendev.org/742654 released > > - "[FT] TestMaintenance.test_port failing" > - URL: https://bugs.launchpad.net/bugs/1888828 > - Assignee: ralonsoh > - Fix: https://review.opendev.org/742868 > > Medium: > > - "neutron-sanity-check fails with NoSuchOptError: no such option vf_management" > - URL: https://bugs.launchpad.net/bugs/1888920 > - Assignee: ralonsoh > - Fix: https://review.opendev.org/743190 > > Wishlist: > > - "Change the default value of "propagate_uplink_status" to True" > - URL: https://bugs.launchpad.net/bugs/1888487 > > - "Improve core plugin extension filtering using the mechanism driver information" > - URL: https://bugs.launchpad.net/bugs/1888829 > > Next up is Akihiro. Last week's report by lajoskatona is here: > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016002.html > > Thanks, > > Nate > From openstack at nemebean.com Mon Jul 27 20:07:20 2020 From: openstack at nemebean.com (Ben Nemec) Date: Mon, 27 Jul 2020 15:07:20 -0500 Subject: [oslo] PTO next week Message-ID: <90b323bf-4147-829b-f70c-a413ef00b28e@nemebean.com> Hello fellow virtual Scandinavians, As noted in the meeting this week, I will be avoiding human contact in a different location next week, one that has fewer people and no internet. That means I won't be running the meeting. As always, if someone else from the Oslo team wants to run it, please feel free. I'll catch up when I get back. Or I might appoint you PTL. Who knows! ;-) -Ben From mnaser at vexxhost.com Mon Jul 27 20:34:14 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 27 Jul 2020 16:34:14 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - Migrate testing to ubuntu focal https://review.opendev.org/740851 - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - [manila] assert:supports-accessible-upgrade https://review.opendev.org/740509 - V goals, Zuul v3 migration: update links and grenade https://review.opendev.org/741987 - Deprecate os_congress project https://review.opendev.org/742533 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 [Updated 34 days ago] - Add legacy repository validation https://review.opendev.org/737559 [Updated 19 days ago] ## General Changes - Migrate from mock to built in unittest.mock https://review.opendev.org/722924 # Email Threads - Legacy Zuul Jobs Update 1: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016058.html - Community PyCharm Licenses: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016039.html - Release Countdown R-12: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016056.html # Other Reminders - Milestone 2 is this week on July 30th Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From whayutin at redhat.com Mon Jul 27 21:17:34 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Mon, 27 Jul 2020 15:17:34 -0600 Subject: [tripleo][ci] container pulls failing Message-ID: FYI... If you find your jobs are failing with an error similar to [1], you have been rate limited by docker.io via the upstream mirror system and have hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra and a few CI engineers. There are a few ways to mitigate the issue however I don't see any of the options being completed very quickly so I'm asking for your patience while this issue is socialized and resolved. For full transparency we're considering the following options. 1. move off of docker.io to quay.io 2. local container builds for each job in master, possibly ussuri 3. parent child jobs upstream where rpms and containers will be build and host artifacts for the child jobs 4. remove some portion of the upstream jobs to lower the impact we have on 3rd party infrastructure. If you have thoughts please don't hesitate to share on this thread. Very sorry we're hitting these failures and I really appreciate your patience. I would expect major delays in getting patches merged at this point until things are resolved. Thank you! [1] HTTPError: 429 Client Error: Too Many Requests for url: http://mirror.ca-ymq-1.vexxhost.opendev.org:8082/v2/tripleotrain/centos-binary-cron/blobs/sha256:76342b0db11c6b5acf33b9f1cbf10b3d2680fb20967ccd7daa9593a39e9e45c0 [2] https://bugs.launchpad.net/tripleo/+bug/1889122 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Mon Jul 27 23:23:41 2020 From: jimmy at openstack.org (Jimmy McArthur) Date: Mon, 27 Jul 2020 18:23:41 -0500 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> Message-ID: <4e722cd6-9ff7-ac76-03f3-61c352d96801@openstack.org> Thomas, We've pushed a fix for this.  Please let us know if you have any further trouble. Thank you! Jimmy Jimmy McArthur wrote on 7/27/20 10:54 AM: > That does indeed sound like a bug.  Let me test with your account and > I'll update ASAP. > > Cheers, > Jimmy > > Thomas Goirand wrote on 7/27/20 2:08 AM: >> Well, there's a bug then... >> >> When I got to: >> https://cfp.openstack.org/app/profile >> >> under the Email field, it displays tho%2A%2A%40goirand.fr which I cannot >> edit. Then when I click on SAVE, I'm being told that the email isn't a >> valid one (but I cannot edit it...). >> >> As a result, I can never save my updated bio... > > From zigo at debian.org Tue Jul 28 07:04:49 2020 From: zigo at debian.org (Thomas Goirand) Date: Tue, 28 Jul 2020 09:04:49 +0200 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <4e722cd6-9ff7-ac76-03f3-61c352d96801@openstack.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> <4e722cd6-9ff7-ac76-03f3-61c352d96801@openstack.org> Message-ID: <1fd2cd70-1b9b-47d1-9236-97673247f295@debian.org> On 7/28/20 1:23 AM, Jimmy McArthur wrote: > Jimmy McArthur wrote on 7/27/20 10:54 AM: >> That does indeed sound like a bug.  Let me test with your account and >> I'll update ASAP. >> >> Cheers, >> Jimmy >> >> Thomas Goirand wrote on 7/27/20 2:08 AM: >>> Well, there's a bug then... >>> >>> When I got to: >>> https://cfp.openstack.org/app/profile >>> >>> under the Email field, it displays tho%2A%2A%40goirand.fr which I cannot >>> edit. Then when I click on SAVE, I'm being told that the email isn't a >>> valid one (but I cannot edit it...). >>> >>> As a result, I can never save my updated bio... > > Thomas, > > We've pushed a fix for this. Please let us know if you have any further > trouble. > > Thank you! > Jimmy This worked, thanks Jimmy! One last bug though: I can't select anything in "What is your current Organizational Role at your company? (check all that apply):" (ie: when I click, nothing happens... checkboxes stay untick). Cheers, Thomas Goirand (zigo) From johannes.kulik at sap.com Tue Jul 28 08:02:18 2020 From: johannes.kulik at sap.com (Johannes Kulik) Date: Tue, 28 Jul 2020 10:02:18 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Message-ID: <671fec63-8bea-4215-c773-d8360e368a99@sap.com> Hi, On 7/27/20 7:08 PM, Dan Smith wrote: > > The primary concern was about something other than nova sitting on our > bus making calls to our internal services. I imagine that the proposal > to bake it into oslo.messaging is for the same purpose, and I'd probably > have the same concern. At the time I think we agreed that if we were > going to support direct-to-service health checks, they should be teensy > HTTP servers with oslo healthchecks middleware. Further loading down > rabbit with those pings doesn't seem like the best plan to > me. Especially since Nova (compute) services already check in over RPC > periodically and the success of that is discoverable en masse through > the API. > > --Dan > While I get this concern, we have seen the problem described by the original poster in production multiple times: nova-compute reports to be healthy, is seen as up through the API, but doesn't work on any messages anymore. A health-check going through rabbitmq would really help spotting those situations, while having an additional HTTP server doesn't. Have a nice day, Johannes -- Johannes Kulik IT Architecture Senior Specialist *SAP SE *| Rosenthaler Str. 30 | 10178 Berlin | Germany From bdobreli at redhat.com Tue Jul 28 08:38:30 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 28 Jul 2020 10:38:30 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Message-ID: <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> On 7/27/20 7:08 PM, Dan Smith wrote: >> Tagging with Nova and Neutron as they are mentioned and I thought some >> people from those teams had opinions on this. > > Nova already implements ping() on the compute RPC interface, which we > use to make sure compute waits to start up until conductor is available > to do its bidding. So if a new obligatory RPC server method is actually > added called ping(), it will break us. > >> Can you refresh my memory on why we dropped this before? I recall >> talking about it in Denver, but I can't for the life of me remember >> what the conclusion was. Did we intend to use something else for this >> that has since fallen through? > > The prior conversation I recall was about helm sitting on our bus to > (ab)use our ping method for health checks: > > https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa > > I believe that has since been reverted. > > The primary concern was about something other than nova sitting on our > bus making calls to our internal services. I imagine that the proposal > to bake it into oslo.messaging is for the same purpose, and I'd probably > have the same concern. At the time I think we agreed that if we were > going to support direct-to-service health checks, they should be teensy > HTTP servers with oslo healthchecks middleware. Further loading down > rabbit with those pings doesn't seem like the best plan to > me. Especially since Nova (compute) services already check in over RPC > periodically and the success of that is discoverable en masse through > the API. Having RPC ping in the common messaging library could improve aliveness handling of long-running APIs, like listing multiple Neutron ports or Heat objects with full details, or running some longish Mistral workflow maybe. Indeed it should be made not breaking things already existing in Nova ofc. > > --Dan > -- Best regards, Bogdan Dobrelya, Irc #bogdando From mark at stackhpc.com Tue Jul 28 09:30:57 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 28 Jul 2020 10:30:57 +0100 Subject: London Openinfra virtual meetup In-Reply-To: References: Message-ID: On Mon, 27 Jul 2020 at 19:07, Mark Goddard wrote: > > On Mon, 27 Jul 2020 at 19:06, Mark Goddard wrote: > > > > Hi, > > > > The London Openinfra meetup group [1] are hosting a virtual meetup [2] > > this Thursday (30th July) at 17:00 UTC. Owing to the magic of video > > streaming [3], attendees do not need to be physically present in > > London to watch! > > > > Here's the schedule: > > > > 6:00pm to 6:30pm - Introductions, OpenStack 10th anniversary retrospective > > 6:30pm to 7:15pm - Edge ecosystem, use cases and architectures (Ildikó > > Vancsa, OpenStack Foundation and Gergely Csatari, Nokia) > > 7:20pm to 8:05pm - Kayobe & Kolla - sane OpenStack deployment (Mark > > Goddard, StackHPC > > 8:15pm to 9:00pm - TBC And if that isn't enough to tempt you, the final speaker has been confirmed: 8:15pm to 9:00pm - 7 years of CERN Cloud - From 0 to 300k cores (Belmiro Moreira, CERN) > > These times are BST, which is UTC+1. > > > > > I can guarantee that Ildikó and Gergely's talk will be great. As for > > the other one, you'll have to judge for yourselves ;) > > > > Cheers, > > Mark > > > > [1] https://www.meetup.com/OpenInfra-London/ > > [2] https://www.meetup.com/OpenInfra-London/events/272083028/ > > [3] https://www.youtube.com/watch?v=0liqSO0SZ60 From skaplons at redhat.com Tue Jul 28 10:37:04 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 28 Jul 2020 12:37:04 +0200 Subject: [neutron][networking-midonet] Maintainers needed In-Reply-To: <20200623072559.aqmzr7ljvavwfsfp@skaplons-mac> References: <20200623072559.aqmzr7ljvavwfsfp@skaplons-mac> Message-ID: <8568B0F3-FC97-46F5-8142-29447BEEB99E@redhat.com> Hi, It’s been a while since I sent last message and ask if anyone wants to keep networking-midonet maintained in Neutron stadium. I didn’t get any responses for that yet and as we are now in a week of Victoria-2 milestone it’s “final call” for new maintainers for this project. Next week I will start sending patches to mark this project as deprecated in the same way as we did with neutron-fwaas in the U cycle. So if You are using networking-midonet and You want to keep it in Neutron stadium, please reach out to me by email or on IRC. > On 23 Jun 2020, at 09:25, Slawek Kaplonski wrote: > > Hi, > > Over the past couple of cycles we have noticed that new contributions and > maintenance efforts for networking-midonet project were lower and lower. > This impacts patches for bug fixes, new features and reviews. The Neutron > core team is trying to at least keep the CI of this project more or less > healthy, but we don't have enough cycles and knowledge about the details of > this project code base to review more complex patches. > > During the PTG in Shanghai we discussed that with operators and TC members > during the forum session [1] and later within the Neutron team during the > PTG session [2]. > > During these discussions, with the help of operators and TC members, we reached > the conclusion that we need to have someone responsible for maintaining project. > This doesn't mean that the maintainer needs to spend full time working on this > project. Rather, we need someone to be the contact person for the project, who > takes care of the project's CI and review patches. Of course that'ss only a > minimal requirement. If the new maintainer works on new features for the > project, it's even better :) > > I recently spoke with members of current networking-midonet core team and they > told me that they don't have cycles to focus on this project. > So if we don't have any new maintainer(s) before milestone Victoria-2, which is > Jul 27 - Jul 31 according to [3], we will need to mark networking-midonet > as deprecated for removal. > "Removal" means that in W cycle we will remove code of this project from its > master branch and there will be no new releases of it. We will still keep code > in stable branches which were already released until its EOL. > > So if You are using this project now, or if You have customers who are > using it, please consider the possibility of maintaining it. Otherwise, > please be aware that it is highly possible that the project will be > deprecated and moved out from the official OpenStack projects. > > [1] https://etherpad.openstack.org/p/PVG-Neutron-stadium-projects-the-path-forward > [2] https://etherpad.openstack.org/p/Shanghai-Neutron-Planning-restored - Lines 379-421 > [3] https://releases.openstack.org/victoria/schedule.html > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Principal software engineer Red Hat From emilien at redhat.com Tue Jul 28 13:06:24 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 28 Jul 2020 09:06:24 -0400 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin wrote: > FYI... > > If you find your jobs are failing with an error similar to [1], you have > been rate limited by docker.io via the upstream mirror system and have > hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra and a > few CI engineers. > > There are a few ways to mitigate the issue however I don't see any of the > options being completed very quickly so I'm asking for your patience while > this issue is socialized and resolved. > > For full transparency we're considering the following options. > > 1. move off of docker.io to quay.io > quay.io also has API rate limit: https://docs.quay.io/issues/429.html Now I'm not sure about how many requests per seconds one can do vs the other but this would need to be checked with the quay team before changing anything. Also quay.io had its big downtimes as well, SLA needs to be considered. 2. local container builds for each job in master, possibly ussuri > Not convinced. You can look at CI logs: - pulling / updating / pushing container images from docker.io to local registry takes ~10 min on standalone (OVH) - building containers from scratch with updated repos and pushing them to local registry takes ~29 min on standalone (OVH). > 3. parent child jobs upstream where rpms and containers will be build and > host artifacts for the child jobs > Yes, we need to investigate that. > 4. remove some portion of the upstream jobs to lower the impact we have on > 3rd party infrastructure. > I'm not sure I understand this one, maybe you can give an example of what could be removed? > If you have thoughts please don't hesitate to share on this thread. Very > sorry we're hitting these failures and I really appreciate your patience. > I would expect major delays in getting patches merged at this point until > things are resolved. > > Thank you! > > [1] HTTPError: 429 Client Error: Too Many Requests for url: > http://mirror.ca-ymq-1.vexxhost.opendev.org:8082/v2/tripleotrain/centos-binary-cron/blobs/sha256:76342b0db11c6b5acf33b9f1cbf10b3d2680fb20967ccd7daa9593a39e9e45c0 > [2] https://bugs.launchpad.net/tripleo/+bug/1889122 > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 28 13:20:17 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 28 Jul 2020 07:20:17 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi wrote: > > > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin wrote: >> >> FYI... >> >> If you find your jobs are failing with an error similar to [1], you have been rate limited by docker.io via the upstream mirror system and have hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra and a few CI engineers. >> >> There are a few ways to mitigate the issue however I don't see any of the options being completed very quickly so I'm asking for your patience while this issue is socialized and resolved. >> >> For full transparency we're considering the following options. >> >> 1. move off of docker.io to quay.io > > > quay.io also has API rate limit: > https://docs.quay.io/issues/429.html > > Now I'm not sure about how many requests per seconds one can do vs the other but this would need to be checked with the quay team before changing anything. > Also quay.io had its big downtimes as well, SLA needs to be considered. > >> 2. local container builds for each job in master, possibly ussuri > > > Not convinced. > You can look at CI logs: > - pulling / updating / pushing container images from docker.io to local registry takes ~10 min on standalone (OVH) > - building containers from scratch with updated repos and pushing them to local registry takes ~29 min on standalone (OVH). > >> >> 3. parent child jobs upstream where rpms and containers will be build and host artifacts for the child jobs > > > Yes, we need to investigate that. > >> >> 4. remove some portion of the upstream jobs to lower the impact we have on 3rd party infrastructure. > > > I'm not sure I understand this one, maybe you can give an example of what could be removed? We need to re-evaulate our use of scenarios (e.g. we have two scenario010's both are non-voting). There's a reason we historically didn't want to add more jobs because of these types of resource constraints. I think we've added new jobs recently and likely need to reduce what we run. Additionally we might want to look into reducing what we run on stable branches as well. > >> >> If you have thoughts please don't hesitate to share on this thread. Very sorry we're hitting these failures and I really appreciate your patience. I would expect major delays in getting patches merged at this point until things are resolved. >> >> Thank you! >> >> [1] HTTPError: 429 Client Error: Too Many Requests for url: http://mirror.ca-ymq-1.vexxhost.opendev.org:8082/v2/tripleotrain/centos-binary-cron/blobs/sha256:76342b0db11c6b5acf33b9f1cbf10b3d2680fb20967ccd7daa9593a39e9e45c0 >> [2] https://bugs.launchpad.net/tripleo/+bug/1889122 > > > > -- > Emilien Macchi From emilien at redhat.com Tue Jul 28 13:23:54 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 28 Jul 2020 09:23:54 -0400 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz wrote: > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi wrote: > > > > > > > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin > wrote: > >> > >> FYI... > >> > >> If you find your jobs are failing with an error similar to [1], you > have been rate limited by docker.io via the upstream mirror system and > have hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra > and a few CI engineers. > >> > >> There are a few ways to mitigate the issue however I don't see any of > the options being completed very quickly so I'm asking for your patience > while this issue is socialized and resolved. > >> > >> For full transparency we're considering the following options. > >> > >> 1. move off of docker.io to quay.io > > > > > > quay.io also has API rate limit: > > https://docs.quay.io/issues/429.html > > > > Now I'm not sure about how many requests per seconds one can do vs the > other but this would need to be checked with the quay team before changing > anything. > > Also quay.io had its big downtimes as well, SLA needs to be considered. > > > >> 2. local container builds for each job in master, possibly ussuri > > > > > > Not convinced. > > You can look at CI logs: > > - pulling / updating / pushing container images from docker.io to local > registry takes ~10 min on standalone (OVH) > > - building containers from scratch with updated repos and pushing them > to local registry takes ~29 min on standalone (OVH). > > > >> > >> 3. parent child jobs upstream where rpms and containers will be build > and host artifacts for the child jobs > > > > > > Yes, we need to investigate that. > > > >> > >> 4. remove some portion of the upstream jobs to lower the impact we have > on 3rd party infrastructure. > > > > > > I'm not sure I understand this one, maybe you can give an example of > what could be removed? > > We need to re-evaulate our use of scenarios (e.g. we have two > scenario010's both are non-voting). There's a reason we historically > didn't want to add more jobs because of these types of resource > constraints. I think we've added new jobs recently and likely need to > reduce what we run. Additionally we might want to look into reducing > what we run on stable branches as well. > Oh... removing jobs (I thought we would remove some steps of the jobs). Yes big +1, this should be a continuous goal when working on CI, and always evaluating what we need vs what we run now. We should look at: 1) services deployed in scenarios that aren't worth testing (e.g. deprecated or unused things) (and deprecate the unused things) 2) jobs themselves (I don't have any example beside scenario010 but I'm sure there are more). -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgiusti at gmail.com Tue Jul 28 14:11:47 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Tue, 28 Jul 2020 10:11:47 -0400 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> Message-ID: On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya wrote: > On 7/27/20 7:08 PM, Dan Smith wrote: > >> Tagging with Nova and Neutron as they are mentioned and I thought some > >> people from those teams had opinions on this. > > > > Nova already implements ping() on the compute RPC interface, which we > > use to make sure compute waits to start up until conductor is available > > to do its bidding. So if a new obligatory RPC server method is actually > > added called ping(), it will break us. > > > >> Can you refresh my memory on why we dropped this before? I recall > >> talking about it in Denver, but I can't for the life of me remember > >> what the conclusion was. Did we intend to use something else for this > >> that has since fallen through? > > > > The prior conversation I recall was about helm sitting on our bus to > > (ab)use our ping method for health checks: > > > > > https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa > > > > I believe that has since been reverted. > > > > The primary concern was about something other than nova sitting on our > > bus making calls to our internal services. I imagine that the proposal > > to bake it into oslo.messaging is for the same purpose, and I'd probably > > have the same concern. At the time I think we agreed that if we were > > going to support direct-to-service health checks, they should be teensy > > HTTP servers with oslo healthchecks middleware. Further loading down > > rabbit with those pings doesn't seem like the best plan to > > me. Especially since Nova (compute) services already check in over RPC > > periodically and the success of that is discoverable en masse through > > the API. > > Having RPC ping in the common messaging library could improve aliveness > handling of long-running APIs, like listing multiple Neutron ports or > Heat objects with full details, or running some longish Mistral workflow > maybe. Indeed it should be made not breaking things already existing in > Nova ofc. > > Not sure this is related to your concern about long running API's but O.M. has an optional RPC call heartbeat monitor that verifies the connectivity to the server while the call is in progress. See the description of call_monitor_timeout in the RPC client docs [0]. 0: https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html > > > > --Dan > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Jul 28 14:23:47 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 28 Jul 2020 10:23:47 -0400 Subject: [cinder][stable] branch freeze for ocata, pike Message-ID: tl;dr - do not approve any backports to stable/ocata or stable/pike in any Cinder project deliverable stable/ocata has been tagged with ocata-eol in cinder, os-brick, python-cinderclient, and python-brick-cinderclient-ext. Nothing should be merged into stable/ocata in any of these repositories during the interim period before the branches are deleted. stable/pike: the changes discussed in [0] have merged, and I've proposed the pike-eol tags [1]. Nothing should be merged into stable/pike in any of our code repositories from now until the branches are deleted. [0] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016076.html [1] https://review.opendev.org/#/c/742523/ From bdobreli at redhat.com Tue Jul 28 14:25:04 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 28 Jul 2020 16:25:04 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> Message-ID: On 7/28/20 4:11 PM, Ken Giusti wrote: > > > On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya > wrote: > > On 7/27/20 7:08 PM, Dan Smith wrote: > >> Tagging with Nova and Neutron as they are mentioned and I > thought some > >> people from those teams had opinions on this. > > > > Nova already implements ping() on the compute RPC interface, which we > > use to make sure compute waits to start up until conductor is > available > > to do its bidding. So if a new obligatory RPC server method is > actually > > added called ping(), it will break us. > > > >> Can you refresh my memory on why we dropped this before? I recall > >> talking about it in Denver, but I can't for the life of me remember > >> what the conclusion was. Did we intend to use something else for > this > >> that has since fallen through? > > > > The prior conversation I recall was about helm sitting on our bus to > > (ab)use our ping method for health checks: > > > > > https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa > > > > I believe that has since been reverted. > > > > The primary concern was about something other than nova sitting > on our > > bus making calls to our internal services. I imagine that the > proposal > > to bake it into oslo.messaging is for the same purpose, and I'd > probably > > have the same concern. At the time I think we agreed that if we were > > going to support direct-to-service health checks, they should be > teensy > > HTTP servers with oslo healthchecks middleware. Further loading down > > rabbit with those pings doesn't seem like the best plan to > > me. Especially since Nova (compute) services already check in > over RPC > > periodically and the success of that is discoverable en masse through > > the API. > > Having RPC ping in the common messaging library could improve aliveness > handling of long-running APIs, like listing multiple Neutron ports or > Heat objects with full details, or running some longish Mistral > workflow > maybe. Indeed it should be made not breaking things already existing in > Nova ofc. > > > Not sure this is related to your concern about long running API's but > O.M. has an optional RPC call heartbeat monitor that verifies the > connectivity to the server while the call is in progress.  See the > description of call_monitor_timeout in the RPC client docs [0]. Correct, but heartbeats didn't show off as a reliable solution. There were WSGI & eventlet related issues [1] with running heartbeats. I can't recall that was the final outcome of that discussion and what was the fix. So relying on explicit pings sent by clients could work better perhaps. [1] https://bugs.launchpad.net/tripleo/+bug/1829062 > > 0: https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html > > > > > > > --Dan > > > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > > > -- > Ken Giusti  (kgiusti at gmail.com ) -- Best regards, Bogdan Dobrelya, Irc #bogdando From kgiusti at gmail.com Tue Jul 28 14:25:20 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Tue, 28 Jul 2020 10:25:20 -0400 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> Message-ID: On Mon, Jul 27, 2020 at 1:18 PM Dan Smith wrote: > > Tagging with Nova and Neutron as they are mentioned and I thought some > > people from those teams had opinions on this. > > Nova already implements ping() on the compute RPC interface, which we > use to make sure compute waits to start up until conductor is available > to do its bidding. So if a new obligatory RPC server method is actually > added called ping(), it will break us. > > > Can you refresh my memory on why we dropped this before? I recall > > talking about it in Denver, but I can't for the life of me remember > > what the conclusion was. Did we intend to use something else for this > > that has since fallen through? > > The prior conversation I recall was about helm sitting on our bus to > (ab)use our ping method for health checks: > > > https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa > > I believe that has since been reverted. > > The primary concern was about something other than nova sitting on our > bus making calls to our internal services. I imagine that the proposal > to bake it into oslo.messaging is for the same purpose, and I'd probably > have the same concern. At the time I think we agreed that if we were > going to support direct-to-service health checks, they should be teensy > HTTP servers with oslo healthchecks middleware. Further loading down > rabbit with those pings doesn't seem like the best plan to > me. Especially since Nova (compute) services already check in over RPC > periodically and the success of that is discoverable en masse through > the API. > > --Dan > > While initially in favor of this feature Dan's concern has me reconsidering this. Now I believe that if the purpose of this feature is to check the operational health of a service _using_ oslo.messaging, then I'm against it. A naked ping to a generic service point in an application doesn't prove the operating health of that application beyond its connection to rabbit. Connectivity monitoring between an application and rabbit is done using the keepalive connection heartbeat mechanism built into the rabbit protocol, which O.M. supports today. -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jul 28 14:50:55 2020 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 28 Jul 2020 15:50:55 +0100 Subject: =?UTF-8?Q?=5Bkolla=5D_Proposing_Micha=C5=82_Nasiadka_for_kayobe=2Dcore?= Message-ID: Hi, I'd like to propose adding Michał Nasiadka to the kayobe-core group. Michał is a valued member of the Kolla core team, and has been providing some good patches and reviews for Kayobe too. Kayobians, please respond with +1/-1. Cheers, Mark From pierre at stackhpc.com Tue Jul 28 15:02:02 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 28 Jul 2020 17:02:02 +0200 Subject: =?UTF-8?Q?Re=3A_=5Bkolla=5D_Proposing_Micha=C5=82_Nasiadka_for_kayobe=2Dco?= =?UTF-8?Q?re?= In-Reply-To: References: Message-ID: Thank you Michał for your contributions! +1 On Tue, 28 Jul 2020 at 16:51, Mark Goddard wrote: > > Hi, > > I'd like to propose adding Michał Nasiadka to the kayobe-core group. > Michał is a valued member of the Kolla core team, and has been > providing some good patches and reviews for Kayobe too. > > Kayobians, please respond with +1/-1. > > Cheers, > Mark > From doug at stackhpc.com Tue Jul 28 15:08:42 2020 From: doug at stackhpc.com (Doug Szumski) Date: Tue, 28 Jul 2020 16:08:42 +0100 Subject: =?UTF-8?Q?Re=3a_=5bkolla=5d_Proposing_Micha=c5=82_Nasiadka_for_kayo?= =?UTF-8?Q?be-core?= In-Reply-To: References: Message-ID: On 28/07/2020 15:50, Mark Goddard wrote: > Hi, > > I'd like to propose adding Michał Nasiadka to the kayobe-core group. > Michał is a valued member of the Kolla core team, and has been > providing some good patches and reviews for Kayobe too. > > Kayobians, please respond with +1/-1. Sounds excellent, +1 for Michał! > > Cheers, > Mark > From openstack at nemebean.com Tue Jul 28 15:09:32 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 28 Jul 2020 10:09:32 -0500 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> Message-ID: <4c57f258-346e-7fcf-7661-7cac0ab56d31@nemebean.com> On 7/28/20 9:25 AM, Bogdan Dobrelya wrote: > On 7/28/20 4:11 PM, Ken Giusti wrote: >> >> >> On Tue, Jul 28, 2020 at 4:48 AM Bogdan Dobrelya > > wrote: >> >>     On 7/27/20 7:08 PM, Dan Smith wrote: >>      >> Tagging with Nova and Neutron as they are mentioned and I >>     thought some >>      >> people from those teams had opinions on this. >>      > >>      > Nova already implements ping() on the compute RPC interface, >> which we >>      > use to make sure compute waits to start up until conductor is >>     available >>      > to do its bidding. So if a new obligatory RPC server method is >>     actually >>      > added called ping(), it will break us. >>      > >>      >> Can you refresh my memory on why we dropped this before? I recall >>      >> talking about it in Denver, but I can't for the life of me >> remember >>      >> what the conclusion was. Did we intend to use something else for >>     this >>      >> that has since fallen through? >>      > >>      > The prior conversation I recall was about helm sitting on our >> bus to >>      > (ab)use our ping method for health checks: >>      > >>      > >> >> https://opendev.org/openstack/openstack-helm/commit/baf5356a4fb61590a95f64a63c0dcabfebb3baaa >> >>      > >>      > I believe that has since been reverted. >>      > >>      > The primary concern was about something other than nova sitting >>     on our >>      > bus making calls to our internal services. I imagine that the >>     proposal >>      > to bake it into oslo.messaging is for the same purpose, and I'd >>     probably >>      > have the same concern. At the time I think we agreed that if we >> were >>      > going to support direct-to-service health checks, they should be >>     teensy >>      > HTTP servers with oslo healthchecks middleware. Further loading >> down >>      > rabbit with those pings doesn't seem like the best plan to >>      > me. Especially since Nova (compute) services already check in >>     over RPC >>      > periodically and the success of that is discoverable en masse >> through >>      > the API. >> >>     Having RPC ping in the common messaging library could improve >> aliveness >>     handling of long-running APIs, like listing multiple Neutron ports or >>     Heat objects with full details, or running some longish Mistral >>     workflow >>     maybe. Indeed it should be made not breaking things already >> existing in >>     Nova ofc. >> >> >> Not sure this is related to your concern about long running API's but >> O.M. has an optional RPC call heartbeat monitor that verifies the >> connectivity to the server while the call is in progress.  See the >> description of call_monitor_timeout in the RPC client docs [0]. > > Correct, but heartbeats didn't show off as a reliable solution. There > were WSGI & eventlet related issues [1] with running heartbeats. I can't > recall that was the final outcome of that discussion and what was the > fix. So relying on explicit pings sent by clients could work better > perhaps. How so? The client is going to do the exact same thing as oslo.messaging heartbeats - start a separate thread to send pings, then make the long-running RPC call. It would hit the same eventlet/wsgi bug that oslo.messaging does. Also, there's a workaround for that bug in oslo.messaging: https://github.com/openstack/oslo.messaging/commit/1541b0c7f965b9defb02b9e63975db2d29d99242 If you re-implemented heartbeating you would have to also re-implement the workaround. On a related note, I've added a topic to our next meeting to discuss turning that workaround on by default since it's been there for a year and no one has complained that it broke them. > > [1] https://bugs.launchpad.net/tripleo/+bug/1829062 > >> >> 0: >> https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html >> >> >> >>      > >>      > --Dan >>      > >> >> >>     --     Best regards, >>     Bogdan Dobrelya, >>     Irc #bogdando >> >> >> >> >> -- >> Ken Giusti  (kgiusti at gmail.com ) > > From alex.williamson at redhat.com Mon Jul 27 22:23:21 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Mon, 27 Jul 2020 16:23:21 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200727072440.GA28676@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> Message-ID: <20200727162321.7097070e@x1.home> On Mon, 27 Jul 2020 15:24:40 +0800 Yan Zhao wrote: > > > As you indicate, the vendor driver is responsible for checking version > > > information embedded within the migration stream. Therefore a > > > migration should fail early if the devices are incompatible. Is it > > but as I know, currently in VFIO migration protocol, we have no way to > > get vendor specific compatibility checking string in migration setup stage > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > In this way, for devices who does not save device data in precopy stage, > > the migration compatibility checking is as late as in stop-and-copy > > stage, which is too late. > > do you think we need to add the getting/checking of vendor specific > > compatibility string early in save_setup stage? > > > hi Alex, > after an offline discussion with Kevin, I realized that it may not be a > problem if migration compatibility check in vendor driver occurs late in > stop-and-copy phase for some devices, because if we report device > compatibility attributes clearly in an interface, the chances for > libvirt/openstack to make a wrong decision is little. I think it would be wise for a vendor driver to implement a pre-copy phase, even if only to send version information and verify it at the target. Deciding you have no device state to send during pre-copy does not mean your vendor driver needs to opt-out of the pre-copy phase entirely. Please also note that pre-copy is at the user's discretion, we've defined that we can enter stop-and-copy at any point, including without a pre-copy phase, so I would recommend that vendor drivers validate compatibility at the start of both the pre-copy and the stop-and-copy phases. > so, do you think we are now arriving at an agreement that we'll give up > the read-and-test scheme and start to defining one interface (perhaps in > json format), from which libvirt/openstack is able to parse and find out > compatibility list of a source mdev/physical device? Based on the feedback we've received, the previously proposed interface is not viable. I think there's agreement that the user needs to be able to parse and interpret the version information. Using json seems viable, but I don't know if it's the best option. Is there any precedent of markup strings returned via sysfs we could follow? Your idea of having both a "self" object and an array of "compatible" objects is perhaps something we can build on, but we must not assume PCI devices at the root level of the object. Providing both the mdev-type and the driver is a bit redundant, since the former includes the latter. We can't have vendor specific versioning schemes though, ie. gvt-version. We need to agree on a common scheme and decide which fields the version is relative to, ex. just the mdev type? I had also proposed fields that provide information to create a compatible type, for example to create a type_x2 device from a type_x1 mdev type, they need to know to apply an aggregation attribute. If we need to explicitly list every aggregation value and the resulting type, I think we run aground of what aggregation was trying to avoid anyway, so we might need to pick a language that defines variable substitution or some kind of tagging. For example if we could define ${aggr} as an integer within a specified range, then we might be able to define a type relative to that value (type_x${aggr}) which requires an aggregation attribute using the same value. I dunno, just spit balling. Thanks, Alex From niulixin at baidu.com Tue Jul 28 12:28:54 2020 From: niulixin at baidu.com (Niu,Lixin) Date: Tue, 28 Jul 2020 12:28:54 +0000 Subject: privsep helper command exited non-zero (1) [openstack-dev][kuryr] Message-ID: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> HI List When I run the kuryr, and launch the pods from k8s, I get some error like the title. Could you please help me to solve this problem? Please give some tips. Thanks a lot. 2020-07-28 16:35:22.084 27610 ERROR os_vif [-] Failed to plug vif VIFBridge(active=False,address=fa:16:3e:f2:ff:e6,bridge_name='qbradfc2b63-47',has_traffic_filtering=True,id=adfc2b63-471e-4fc6-b7be-72af56ee1f27,network=Network(652fdbae-281d-4475-b883-dcfecb821cbd),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapadfc2b63-47'): oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1) -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Tue Jul 28 15:50:09 2020 From: iurygregory at gmail.com (Iury Gregory) Date: Tue, 28 Jul 2020 17:50:09 +0200 Subject: privsep helper command exited non-zero (1) [openstack-dev][kuryr] In-Reply-To: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> References: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> Message-ID: Hi Niu, Not sure if you will get many answers here since the main list for discussions for openstack is openstack-discuss at lists.openstack.org =) https://superuser.openstack.org/articles/openstack-unifies-mailing-lists/ This article has information about the reason for the new list. Em ter., 28 de jul. de 2020 às 17:17, Niu,Lixin escreveu: > HI List > > When I run the kuryr, and launch the pods from k8s, I get some > error like the title. > > Could you please help me to solve this problem? Please give some > tips. Thanks a lot. > > > > 2020-07-28 16:35:22.084 27610 ERROR os_vif [-] Failed to plug vif > VIFBridge(active=False,address=fa:16:3e:f2:ff:e6,bridge_name='qbradfc2b63-47',has_traffic_filtering=True,id=adfc2b63-471e-4fc6-b7be-72af56ee1f27,network=Network(652fdbae-281d-4475-b883-dcfecb821cbd),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapadfc2b63-47'): > oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited > non-zero (1) > > > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jul 28 16:33:34 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 28 Jul 2020 16:33:34 +0000 Subject: privsep helper command exited non-zero (1) [openstack-dev][kuryr] In-Reply-To: References: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> Message-ID: <20200728163333.4rsxanxxmx6gugib@yuggoth.org> On 2020-07-28 17:50:09 +0200 (+0200), Iury Gregory wrote: > Not sure if you will get many answers here since the main list for > discussions for openstack is openstack-discuss at lists.openstack.org =) [...] The old openstack-dev address forwards to the openstack-discuss list via an alias (same for interop-wg, openstack, openstack-infra, openstack-operators, openstack-sigs, openstack-tc and user-committee). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mdulko at redhat.com Tue Jul 28 17:15:03 2020 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Tue, 28 Jul 2020 19:15:03 +0200 Subject: privsep helper command exited non-zero (1) [openstack-dev][kuryr] In-Reply-To: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> References: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> Message-ID: <2b387f8a48ed6d408af062d6759877b9260d8415.camel@redhat.com> On Tue, 2020-07-28 at 12:28 +0000, Niu,Lixin wrote: > HI List > When I run the kuryr, and launch the pods from k8s, I get some > error like the title. > Could you please help me to solve this problem? Please give > some tips. Thanks a lot. > > 2020-07-28 16:35:22.084 27610 ERROR os_vif [-] Failed to plug vif VIFBridge(active=False,address=fa:16:3e:f2:ff:e6,bridge_name='qbradfc2b63-47',has_traffic_filtering=True,id=adfc2b63-471e-4fc6-b7be-72af56ee1f27,network=Network(652fdbae-281d-4475-b883-dcfecb821cbd),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapadfc2b63-47'): oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1) > Hi, This seems pretty vague, but my first bet would be that kuryr-daemon is not running as root? What's the exact configuration you use - i.e. Neutron plugin, Kuryr binding plugin (nested vs neutron), are you running Kuryr services as pods? Thanks, Michał From whayutin at redhat.com Tue Jul 28 16:09:49 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Tue, 28 Jul 2020 10:09:49 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi wrote: > > > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz wrote: > >> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >> wrote: >> > >> > >> > >> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >> wrote: >> >> >> >> FYI... >> >> >> >> If you find your jobs are failing with an error similar to [1], you >> have been rate limited by docker.io via the upstream mirror system and >> have hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra >> and a few CI engineers. >> >> >> >> There are a few ways to mitigate the issue however I don't see any of >> the options being completed very quickly so I'm asking for your patience >> while this issue is socialized and resolved. >> >> >> >> For full transparency we're considering the following options. >> >> >> >> 1. move off of docker.io to quay.io >> > >> > >> > quay.io also has API rate limit: >> > https://docs.quay.io/issues/429.html >> > >> > Now I'm not sure about how many requests per seconds one can do vs the >> other but this would need to be checked with the quay team before changing >> anything. >> > Also quay.io had its big downtimes as well, SLA needs to be considered. >> > >> >> 2. local container builds for each job in master, possibly ussuri >> > >> > >> > Not convinced. >> > You can look at CI logs: >> > - pulling / updating / pushing container images from docker.io to >> local registry takes ~10 min on standalone (OVH) >> > - building containers from scratch with updated repos and pushing them >> to local registry takes ~29 min on standalone (OVH). >> > >> >> >> >> 3. parent child jobs upstream where rpms and containers will be build >> and host artifacts for the child jobs >> > >> > >> > Yes, we need to investigate that. >> > >> >> >> >> 4. remove some portion of the upstream jobs to lower the impact we >> have on 3rd party infrastructure. >> > >> > >> > I'm not sure I understand this one, maybe you can give an example of >> what could be removed? >> >> We need to re-evaulate our use of scenarios (e.g. we have two >> scenario010's both are non-voting). There's a reason we historically >> didn't want to add more jobs because of these types of resource >> constraints. I think we've added new jobs recently and likely need to >> reduce what we run. Additionally we might want to look into reducing >> what we run on stable branches as well. >> > > Oh... removing jobs (I thought we would remove some steps of the jobs). > Yes big +1, this should be a continuous goal when working on CI, and > always evaluating what we need vs what we run now. > > We should look at: > 1) services deployed in scenarios that aren't worth testing (e.g. > deprecated or unused things) (and deprecate the unused things) > 2) jobs themselves (I don't have any example beside scenario010 but I'm > sure there are more). > -- > Emilien Macchi > Thanks Alex, Emilien +1 to reviewing the catalog and adjusting things on an ongoing basis. All.. it looks like the issues with docker.io were more of a flare up than a change in docker.io policy or infrastructure [2]. The flare up started on July 27 8am utc and ended on July 27 17:00 utc, see screenshots. I've socialized the issue with the CI team and some ways to reduce our reliance on docker.io or any public registry. Sagi and I have a draft design that we'll share on this list after a first round of a POC. We also thought we'd leverage Emilien's awesome work [1] to build containers locally in standalone for widely to reduce our traffic to docker.io and upstream proxies. TLDR, feel free to recheck and wf. Thanks for your patience!! [1] https://review.opendev.org/#/q/status:open++topic:dos_docker.io [2] link to logstash query be sure to change the time range -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: docker.io_2.png Type: image/png Size: 93167 bytes Desc: not available URL: From dms at danplanet.com Tue Jul 28 22:26:46 2020 From: dms at danplanet.com (Dan Smith) Date: Tue, 28 Jul 2020 15:26:46 -0700 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: (Bogdan Dobrelya's message of "Tue, 28 Jul 2020 16:25:04 +0200") References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> Message-ID: > Correct, but heartbeats didn't show off as a reliable solution. There > were WSGI & eventlet related issues [1] with running heartbeats. I > can't recall that was the final outcome of that discussion and what > was the fix. So relying on explicit pings sent by clients could work > better perhaps. > > [1] https://bugs.launchpad.net/tripleo/+bug/1829062 There are two types of heartbeats in and around oslo.messaging, which is why call_monitor was used for the long-running RPC thing. The bug you're referencing is, I believe, talking about heartbeating the api->rabbit connection, and has nothing to do with service-to-service pinging, which this thread is about. The call_monitor stuff Ken mentioned requires the *server* side to do the heartbeating, so something like nova-compute or nova-conductor. Those things aren't running under uwsgi and don't have any problems with threading to accomplish those goals. So, if we're talking about generic ping() to provide a robust long-running RPC call, oslo.messaging already does this (if you ask for it). Otherwise, a generic service-to-service ping() doesn't, as was mentioned, really mean anything at all about the ability to do meaningful work (other than further saturate the message bus). --Dan From rosmaita.fossdev at gmail.com Tue Jul 28 23:33:35 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 28 Jul 2020 19:33:35 -0400 Subject: [cinder] victoria virtual mid-cycle part 2 poll Message-ID: <47fd9988-2ef4-bc9d-73eb-d655e30d26b4@gmail.com> Hello Cinder team and fellow travelers, Part 2 of our Victoria virtual mid-cycle, to be held at R-9 (the week of 10 August) is fast approaching, so we need to pick a date/time for the 2-hour virtual event. Please indicate your availability on the following poll: https://doodle.com/poll/ywdrhpy78kuizusn Please respond before 12:00 UTC on Tuesday 4 August. thanks, brian From laurentfdumont at gmail.com Wed Jul 29 05:55:00 2020 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Wed, 29 Jul 2020 01:55:00 -0400 Subject: [ops] Reviving OSOps ? In-Reply-To: References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> Message-ID: Interested in this as well. We use Openstack a $Dayjob :) On Mon, Jul 27, 2020 at 2:52 PM Amy Marrich wrote: > +1 on combining this in with the existing SiG and efforts. > > Amy (spotz) > > On Mon, Jul 27, 2020 at 1:02 PM Sean McGinnis > wrote: > >> >> >> If Osops should be considered distinct from OpenStack >> > >> > That feels like the wrong statement to make, even if only implicitly >> > by repo organization. Is there a compelling reason not to have osops >> > under the openstack namespace? >> > >> I think it makes the most sense to be under the openstack namespace. >> >> We have the Operations Docs SIG right now that took on some of the >> operator-specific documentation that no longer had a home. This was a >> consistent issue brought up in the Ops Meetup events. While not "wildly >> successful" in getting a bunch of new and updated docs, it at least has >> accomplished the main goal of getting these docs published to >> docs.openstack.org again, and providing a place where more collaboration >> can (and occasionally does) happen to improve those docs. >> >> I think we could probably expand the scope of this SIG. Especially >> considering it is a pretty low-volume SIG anyway. I would be good with >> changing this to something like the "Operator Docs and Tooling SIG" and >> getting any of these useful tooling repos under governance through that. >> I personally wouldn't be able to spend a lot of time working on anything >> under the SIG, but I'd be happy to keep an eye out for any new reviews >> and help get those through. >> >> Sean >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramishra at redhat.com Wed Jul 29 02:49:54 2020 From: ramishra at redhat.com (Rabi Mishra) Date: Wed, 29 Jul 2020 08:19:54 +0530 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Tue, Jul 28, 2020, 18:59 Emilien Macchi wrote: > > > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz wrote: > >> On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >> wrote: >> > >> > >> > >> > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >> wrote: >> >> >> >> FYI... >> >> >> >> If you find your jobs are failing with an error similar to [1], you >> have been rate limited by docker.io via the upstream mirror system and >> have hit [2]. I've been discussing the issue w/ upstream infra, rdo-infra >> and a few CI engineers. >> >> >> >> There are a few ways to mitigate the issue however I don't see any of >> the options being completed very quickly so I'm asking for your patience >> while this issue is socialized and resolved. >> >> >> >> For full transparency we're considering the following options. >> >> >> >> 1. move off of docker.io to quay.io >> > >> > >> > quay.io also has API rate limit: >> > https://docs.quay.io/issues/429.html >> > >> > Now I'm not sure about how many requests per seconds one can do vs the >> other but this would need to be checked with the quay team before changing >> anything. >> > Also quay.io had its big downtimes as well, SLA needs to be considered. >> > >> >> 2. local container builds for each job in master, possibly ussuri >> > >> > >> > Not convinced. >> > You can look at CI logs: >> > - pulling / updating / pushing container images from docker.io to >> local registry takes ~10 min on standalone (OVH) >> > - building containers from scratch with updated repos and pushing them >> to local registry takes ~29 min on standalone (OVH). >> > >> >> >> >> 3. parent child jobs upstream where rpms and containers will be build >> and host artifacts for the child jobs >> > >> > >> > Yes, we need to investigate that. >> > >> >> >> >> 4. remove some portion of the upstream jobs to lower the impact we >> have on 3rd party infrastructure. >> > >> > >> > I'm not sure I understand this one, maybe you can give an example of >> what could be removed? >> >> We need to re-evaulate our use of scenarios (e.g. we have two >> scenario010's both are non-voting). There's a reason we historically >> didn't want to add more jobs because of these types of resource >> constraints. I think we've added new jobs recently and likely need to >> reduce what we run. Additionally we might want to look into reducing >> what we run on stable branches as well. >> > > Oh... removing jobs (I thought we would remove some steps of the jobs). > Yes big +1, this should be a continuous goal when working on CI, and > always evaluating what we need vs what we run now. > > We should look at: > 1) services deployed in scenarios that aren't worth testing (e.g. > deprecated or unused things) (and deprecate the unused things) > 2) jobs themselves (I don't have any example beside scenario010 but I'm > sure there are more). > Isn't scenario010 testing octavia? Though I've seen toggling between voting/non-voting due to different issues for a long time. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Wed Jul 29 04:17:42 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 29 Jul 2020 06:17:42 +0200 Subject: [ops] Reviving OSOps ? In-Reply-To: References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> Message-ID: +1 Laurent Dumont schrieb am Mi., 29. Juli 2020, 04:00: > Interested in this as well. We use Openstack a $Dayjob :) > > On Mon, Jul 27, 2020 at 2:52 PM Amy Marrich wrote: > >> +1 on combining this in with the existing SiG and efforts. >> >> Amy (spotz) >> >> On Mon, Jul 27, 2020 at 1:02 PM Sean McGinnis >> wrote: >> >>> >>> >> If Osops should be considered distinct from OpenStack >>> > >>> > That feels like the wrong statement to make, even if only implicitly >>> > by repo organization. Is there a compelling reason not to have osops >>> > under the openstack namespace? >>> > >>> I think it makes the most sense to be under the openstack namespace. >>> >>> We have the Operations Docs SIG right now that took on some of the >>> operator-specific documentation that no longer had a home. This was a >>> consistent issue brought up in the Ops Meetup events. While not "wildly >>> successful" in getting a bunch of new and updated docs, it at least has >>> accomplished the main goal of getting these docs published to >>> docs.openstack.org again, and providing a place where more collaboration >>> can (and occasionally does) happen to improve those docs. >>> >>> I think we could probably expand the scope of this SIG. Especially >>> considering it is a pretty low-volume SIG anyway. I would be good with >>> changing this to something like the "Operator Docs and Tooling SIG" and >>> getting any of these useful tooling repos under governance through that. >>> I personally wouldn't be able to spend a lot of time working on anything >>> under the SIG, but I'd be happy to keep an eye out for any new reviews >>> and help get those through. >>> >>> Sean >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Wed Jul 29 08:25:41 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 29 Jul 2020 10:25:41 +0200 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On 7/28/20 6:09 PM, Wesley Hayutin wrote: > > > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi > wrote: > > > > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz > wrote: > > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi > > wrote: > > > > > > > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin > > wrote: > >> > >> FYI... > >> > >> If you find your jobs are failing with an error similar to > [1], you have been rate limited by docker.io > via the upstream mirror system and have hit [2].  I've been > discussing the issue w/ upstream infra, rdo-infra and a few CI > engineers. > >> > >> There are a few ways to mitigate the issue however I don't > see any of the options being completed very quickly so I'm > asking for your patience while this issue is socialized and > resolved. > >> > >> For full transparency we're considering the following options. > >> > >> 1. move off of docker.io to quay.io > > > > > > > quay.io also has API rate limit: > > https://docs.quay.io/issues/429.html > > > > Now I'm not sure about how many requests per seconds one can > do vs the other but this would need to be checked with the quay > team before changing anything. > > Also quay.io had its big downtimes as well, > SLA needs to be considered. > > > >> 2. local container builds for each job in master, possibly > ussuri > > > > > > Not convinced. > > You can look at CI logs: > > - pulling / updating / pushing container images from > docker.io to local registry takes ~10 min on > standalone (OVH) > > - building containers from scratch with updated repos and > pushing them to local registry takes ~29 min on standalone (OVH). > > > >> > >> 3. parent child jobs upstream where rpms and containers will > be build and host artifacts for the child jobs > > > > > > Yes, we need to investigate that. > > > >> > >> 4. remove some portion of the upstream jobs to lower the > impact we have on 3rd party infrastructure. > > > > > > I'm not sure I understand this one, maybe you can give an > example of what could be removed? > > We need to re-evaulate our use of scenarios (e.g. we have two > scenario010's both are non-voting).  There's a reason we > historically > didn't want to add more jobs because of these types of resource > constraints.  I think we've added new jobs recently and likely > need to > reduce what we run. Additionally we might want to look into reducing > what we run on stable branches as well. > > > Oh... removing jobs (I thought we would remove some steps of the jobs). > Yes big +1, this should be a continuous goal when working on CI, and > always evaluating what we need vs what we run now. > > We should look at: > 1) services deployed in scenarios that aren't worth testing (e.g. > deprecated or unused things) (and deprecate the unused things) > 2) jobs themselves (I don't have any example beside scenario010 but > I'm sure there are more). > -- > Emilien Macchi > > > Thanks Alex, Emilien > > +1 to reviewing the catalog and adjusting things on an ongoing basis. > > All.. it looks like the issues with docker.io were > more of a flare up than a change in docker.io policy > or infrastructure [2].  The flare up started on July 27 8am utc and > ended on July 27 17:00 utc, see screenshots. The numbers of image prepare workers and its exponential fallback intervals should be also adjusted. I've analysed the log snippet [0] for the connection reset counts by workers versus the times the rate limiting was triggered. See the details in the reported bug [1]. tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: Conn Reset Counts by a Worker PID: 3 58412 2 58413 3 58415 3 58417 which seems too much of (workers*reconnects) and triggers rate limiting immediately. [0] https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log [1] https://bugs.launchpad.net/tripleo/+bug/1889372 -- Best regards, Bogdan Dobrelya, Irc #bogdando From bdobreli at redhat.com Wed Jul 29 08:39:44 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 29 Jul 2020 10:39:44 +0200 Subject: [largescale-sig][nova][neutron][oslo] RPC ping In-Reply-To: References: <20200727095744.GK31915@sync> <3d238530-6c84-d611-da4c-553ba836fc02@nemebean.com> <6298ee39-2547-d0f1-4c10-d1cbbb4626b8@redhat.com> Message-ID: <3531b2aa-f195-deb3-e5f2-3e48ba55076f@redhat.com> On 7/29/20 12:26 AM, Dan Smith wrote: >> Correct, but heartbeats didn't show off as a reliable solution. There >> were WSGI & eventlet related issues [1] with running heartbeats. I >> can't recall that was the final outcome of that discussion and what >> was the fix. So relying on explicit pings sent by clients could work >> better perhaps. >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1829062 > > There are two types of heartbeats in and around oslo.messaging, which is > why call_monitor was used for the long-running RPC thing. The bug you're > referencing is, I believe, talking about heartbeating the api->rabbit > connection, and has nothing to do with service-to-service pinging, which > this thread is about. > > The call_monitor stuff Ken mentioned requires the *server* side to do > the heartbeating, so something like nova-compute or > nova-conductor. Those things aren't running under uwsgi and don't have > any problems with threading to accomplish those goals. > > So, if we're talking about generic ping() to provide a robust > long-running RPC call, oslo.messaging already does this (if you ask for > it). Otherwise, a generic service-to-service ping() doesn't, as was > mentioned, really mean anything at all about the ability to do > meaningful work (other than further saturate the message bus). Thank you for that great information Dan, Ken. Then please disregard that mistakenly highlighted aspect. Didn't want to derail the thread by that apparently unrelated side case. I believe the original intention for RPC ping was to have something initated by clients, not server-side? That may be useful when running in Kuberenetes pod with aliveness/readiness probes set up. While the latter may be not the best fit for RPC ping indeed, the former seems like a much better way to check aliveness than just checking TCP connection to rabbit port? > > --Dan > -- Best regards, Bogdan Dobrelya, Irc #bogdando From jpena at redhat.com Wed Jul 29 10:35:52 2020 From: jpena at redhat.com (Javier Pena) Date: Wed, 29 Jul 2020 06:35:52 -0400 (EDT) Subject: [infra] CentOS support for mirror role in system-config In-Reply-To: <20200723042103.GA1740223@fedora19.localdomain> References: <287457836.42289622.1595319924417.JavaMail.zimbra@redhat.com> <0ef5ba20-2fd2-4e39-b617-08a54279794a@www.fastmail.com> <20200723042103.GA1740223@fedora19.localdomain> Message-ID: <1047658106.44152074.1596018952840.JavaMail.zimbra@redhat.com> > On Tue, Jul 21, 2020 at 09:30:19AM -0700, Clark Boylan wrote: > > One specific concern along these lines is we've added https support > > to the mirrors. > > Another thing I can see coming is kafs support; which requires recent > kernels but is becoming available in Debian. Just another area we'll > probably want to play in that is distro specific. > > > Would RDO expect us to coordinate upstream changes to the mirrors > > with them? > > Perhaps we should quantify what the bits are we need? > > As I mentioned, I've been shy to move the openafs roles outside > system-config because they rely on debs/rpms built specifically by us > to work around no-packages (rpm) or out of date packages (deb). I > don't want anyone to think they're generic then we break something > when we update the packages for our own purposes. > > There isn't really any magic in the the apache setup done in the > mirror role; it's more or less a straight "install packages put config > in" role. That argument cuts both ways -- it's not much for > system-config to maintain but it's not really much to duplicate > outside. > > The mirror config I can see us wanting to be in sync with. I'd be > happy to move that into a separate role, with a few paramaters to make > it write out in different locations, etc. instead of lumping it all in > with the server setup? > > Is that a compromise position between keeping centos servers in > system-config and making things reusable? Are there other roles of > concern? > This could work. Besides the kerberos-client and openafs-client roles (which should be relatively straightforward to replicate if needed), that would be all we need to keep the configuration in sync. Regards, Javier > -i > > > From mdulko at redhat.com Wed Jul 29 11:02:35 2020 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Wed, 29 Jul 2020 13:02:35 +0200 Subject: privsep helper command exited non-zero (1) [openstack-dev][kuryr] In-Reply-To: References: <89671293-538E-40D4-9FC7-5355744BCECC@baidu.com> <2b387f8a48ed6d408af062d6759877b9260d8415.camel@redhat.com> Message-ID: <8177dd66672b4c8f755ad4399141e342028e2aed.camel@redhat.com> > Hi Dulko > Thanks for your reply. > > My OS is Centos 7.6 + python 3.7 > Kuryr-daemon --version = 2.1.0 > > I run the kuryr-daemon as root. > I run the kuryr not as pods. > I will show you the conf of kuryr-daemon: > > Waiting for your reply. > > Thanks. > ========================================== Please don't remove mailing list from the thread. Hm, I googled some older answer that maybe privsep-helper is not installed where kuryr-daemon is executed? In terms of Kuryr stuff that file has almost nothing set, so it isn't really giving much information, so - what's the Neutron plugin being used? Is the [kubernetes]pod_vif_driver just neutron-vif? I think for that to work you need to set [neutron_defaults]ovs_bridge for the daemon and [neutron_defaults]pod_subnet for controller. Can you enable debug for more logs from the error? On Wed, 2020-07-29 at 02:39 +0000, Niu,Lixin wrote: > > 在 2020/7/29 上午1:15,“Michał Dulko” 写入: > > On Tue, 2020-07-28 at 12:28 +0000, Niu,Lixin wrote: > > HI List > > When I run the kuryr, and launch the pods from k8s, I get some > > error like the title. > > Could you please help me to solve this problem? Please give > > some tips. Thanks a lot. > > > > 2020-07-28 16:35:22.084 27610 ERROR os_vif [-] Failed to plug vif VIFBridge(active=False,address=fa:16:3e:f2:ff:e6,bridge_name='qbradfc2b63-47',has_traffic_filtering=True,id=adfc2b63-471e-4fc6-b7be-72af56ee1f27,network=Network(652fdbae-281d-4475-b883-dcfecb821cbd),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapadfc2b63-47'): oslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1) > > > > Hi, > > This seems pretty vague, but my first bet would be that kuryr-daemon is > not running as root? What's the exact configuration you use - i.e. > Neutron plugin, Kuryr binding plugin (nested vs neutron), are you > running Kuryr services as pods? > > Thanks, > Michał > > > From whayutin at redhat.com Wed Jul 29 13:13:24 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 29 Jul 2020 07:13:24 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: > On 7/28/20 6:09 PM, Wesley Hayutin wrote: > > > > > > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi > > wrote: > > > > > > > > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz > > wrote: > > > > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi > > > wrote: > > > > > > > > > > > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin > > > wrote: > > >> > > >> FYI... > > >> > > >> If you find your jobs are failing with an error similar to > > [1], you have been rate limited by docker.io > > via the upstream mirror system and have hit [2]. I've been > > discussing the issue w/ upstream infra, rdo-infra and a few CI > > engineers. > > >> > > >> There are a few ways to mitigate the issue however I don't > > see any of the options being completed very quickly so I'm > > asking for your patience while this issue is socialized and > > resolved. > > >> > > >> For full transparency we're considering the following > options. > > >> > > >> 1. move off of docker.io to quay.io > > > > > > > > > > > quay.io also has API rate limit: > > > https://docs.quay.io/issues/429.html > > > > > > Now I'm not sure about how many requests per seconds one can > > do vs the other but this would need to be checked with the quay > > team before changing anything. > > > Also quay.io had its big downtimes as well, > > SLA needs to be considered. > > > > > >> 2. local container builds for each job in master, possibly > > ussuri > > > > > > > > > Not convinced. > > > You can look at CI logs: > > > - pulling / updating / pushing container images from > > docker.io to local registry takes ~10 min on > > standalone (OVH) > > > - building containers from scratch with updated repos and > > pushing them to local registry takes ~29 min on standalone (OVH). > > > > > >> > > >> 3. parent child jobs upstream where rpms and containers will > > be build and host artifacts for the child jobs > > > > > > > > > Yes, we need to investigate that. > > > > > >> > > >> 4. remove some portion of the upstream jobs to lower the > > impact we have on 3rd party infrastructure. > > > > > > > > > I'm not sure I understand this one, maybe you can give an > > example of what could be removed? > > > > We need to re-evaulate our use of scenarios (e.g. we have two > > scenario010's both are non-voting). There's a reason we > > historically > > didn't want to add more jobs because of these types of resource > > constraints. I think we've added new jobs recently and likely > > need to > > reduce what we run. Additionally we might want to look into > reducing > > what we run on stable branches as well. > > > > > > Oh... removing jobs (I thought we would remove some steps of the > jobs). > > Yes big +1, this should be a continuous goal when working on CI, and > > always evaluating what we need vs what we run now. > > > > We should look at: > > 1) services deployed in scenarios that aren't worth testing (e.g. > > deprecated or unused things) (and deprecate the unused things) > > 2) jobs themselves (I don't have any example beside scenario010 but > > I'm sure there are more). > > -- > > Emilien Macchi > > > > > > Thanks Alex, Emilien > > > > +1 to reviewing the catalog and adjusting things on an ongoing basis. > > > > All.. it looks like the issues with docker.io were > > more of a flare up than a change in docker.io policy > > or infrastructure [2]. The flare up started on July 27 8am utc and > > ended on July 27 17:00 utc, see screenshots. > > The numbers of image prepare workers and its exponential fallback > intervals should be also adjusted. I've analysed the log snippet [0] for > the connection reset counts by workers versus the times the rate > limiting was triggered. See the details in the reported bug [1]. > > tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: > > Conn Reset Counts by a Worker PID: > 3 58412 > 2 58413 > 3 58415 > 3 58417 > > which seems too much of (workers*reconnects) and triggers rate limiting > immediately. > > [0] > > https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log > > [1] https://bugs.launchpad.net/tripleo/+bug/1889372 > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > FYI.. The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Tue Jul 28 21:46:27 2020 From: thomas.king at gmail.com (Thomas King) Date: Tue, 28 Jul 2020 15:46:27 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Ruslanas has been a tremendous help. To catch up the discussion lists... 1. I enabled Neutron segments. 2. I renamed the existing segments for each network so they'll make sense. 3. I attempted to create a segment for a remote subnet (it is using DHCP relay) and this was the error that is blocking me. This is where the docs do not cover: [root at sea-maas-controller ~(keystone_admin)]# openstack network segment create --physical-network remote146-30-32 --network-type flat --network baremetal seg-remote-146-30-32 BadRequestException: 400: Client Error for url: http://10.146.30.65:9696/v2.0/segments, Invalid input for operation: physical_network 'remote146-30-32' unknown for flat provider network. I've asked Ruslanas to clarify how their physical networks correspond to their remote networks. They have a single provider network and multiple segments tied to multiple physical networks. However, if anyone can shine some light on this, I would greatly appreciate it. How should neutron's configurations accommodate remote networks<->Neutron segments when I have only one physical network attachment for provisioning? Thanks! Tom King On Wed, Jul 15, 2020 at 3:33 PM Thomas King wrote: > That helps a lot, thank you! > > "I use only one network..." > This bit seems to go completely against the Neutron segments > documentation. When you have access, please let me know if Triple-O is > using segments or some other method. > > I greatly appreciate this, this is a tremendous help. > > Tom King > > On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis > wrote: > >> Hi Thomas, >> >> I have a bit complicated setup from tripleo side :) I use only one >> network (only ControlPlane). thanks to Harold, he helped to make it work >> for me. >> >> Yes, as written in the tripleo docs for leaf networks, it use the same >> neutron network, different subnets. so neutron network is ctlplane (I >> think) and have ctlplane-subnet, remote-provision and remote-KI :)) that >> generates additional lines in "ip r s" output for routing "foreign" subnets >> through correct gw, if you would have isolated networks, by vlans and ports >> this would apply for each subnet different gw... I believe you >> know/understand that part. >> >> remote* subnets have dhcp-relay setup by network team... do not ask >> details for that. I do not know how to, but can ask :) >> >> >> in undercloud/tripleo i have 2 dhcp servers, one is for introspection, >> another for provide/cleanup and deployment process. >> >> all of those subnets have organization level tagged networks and are >> tagged on network devices, but they are untagged on provisioning >> interfaces/ports, as in general pxe should be untagged, but some nic's can >> do vlan untag on nic/bios level. but who cares!? >> >> I just did a brief check on your first post, I think I have simmilar >> setup to yours :)) I will check in around 12hours :)) more deaply, as will >> be at work :))) >> >> >> P.S. sorry for wrong terms, I am bad at naming. >> >> >> On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: >> >>> Ruslanas, that would be excellent! >>> >>> I will reply to you directly for details later unless the maillist would >>> like the full thread. >>> >>> Some preliminary questions: >>> >>> - Do you have a separate physical interface for the segment(s) used >>> for your remote subnets? >>> The docs state each segment must have a unique physical network >>> name, which suggests a separate physical interface for each segment unless >>> I'm misunderstanding something. >>> - Are your provisioning segments all on the same Neutron network? >>> - Are you using tagged switchports or access switchports to your >>> Ironic server(s)? >>> >>> Thanks, >>> Tom King >>> >>> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis >>> wrote: >>> >>>> I have deployed that with tripleO, but now we are recabling and >>>> redeploying it. So once I have it running I can share my configs, just name >>>> which you want :) >>>> >>>> On Tue, 14 Jul 2020 at 18:40, Thomas King >>>> wrote: >>>> >>>>> I have. That's the Triple-O docs and they don't go through the normal >>>>> .conf files to explain how it works outside of Triple-O. It has some ideas >>>>> but no running configurations. >>>>> >>>>> Tom King >>>>> >>>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>>>> wrote: >>>>> >>>>>> hi, have you checked: >>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>>> ? >>>>>> I am following this link. I only have one network, having different >>>>>> issues tho ;) >>>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngtech1ltd at gmail.com Wed Jul 29 05:22:44 2020 From: ngtech1ltd at gmail.com (Eliezer Croitor) Date: Wed, 29 Jul 2020 08:22:44 +0300 Subject: Looking for recommendation what OS to use for a minimal installation Message-ID: <000301d66568$4a80d690$df8283b0$@gmail.com> Hey Everybody, In the last month I have tried to install OpenStack minimal from the docs: https://docs.openstack.org/install-guide/openstack-services.html#minimal-dep loyment-for-train https://docs.openstack.org/install-guide/openstack-services.html#minimal-dep loyment-for-ussuri on CentOS 7 and CentOS 8. With the packages from CentOS I need to "patch" or "fix" the installed conf files of httpd and couple other files around. I have tried DevStack and PackStack before but I want to move to the next step. Any recommendation where to start? What OS to use? Etc. Thanks. Eliezer ---- Eliezer Croitoru Tech Support Mobile: +972-5-28704261 Email: ngtech1ltd at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Wed Jul 29 08:05:03 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 29 Jul 2020 16:05:03 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200727162321.7097070e@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> Message-ID: <20200729080503.GB28676@joy-OptiPlex-7040> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > On Mon, 27 Jul 2020 15:24:40 +0800 > Yan Zhao wrote: > > > > > As you indicate, the vendor driver is responsible for checking version > > > > information embedded within the migration stream. Therefore a > > > > migration should fail early if the devices are incompatible. Is it > > > but as I know, currently in VFIO migration protocol, we have no way to > > > get vendor specific compatibility checking string in migration setup stage > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > In this way, for devices who does not save device data in precopy stage, > > > the migration compatibility checking is as late as in stop-and-copy > > > stage, which is too late. > > > do you think we need to add the getting/checking of vendor specific > > > compatibility string early in save_setup stage? > > > > > hi Alex, > > after an offline discussion with Kevin, I realized that it may not be a > > problem if migration compatibility check in vendor driver occurs late in > > stop-and-copy phase for some devices, because if we report device > > compatibility attributes clearly in an interface, the chances for > > libvirt/openstack to make a wrong decision is little. > > I think it would be wise for a vendor driver to implement a pre-copy > phase, even if only to send version information and verify it at the > target. Deciding you have no device state to send during pre-copy does > not mean your vendor driver needs to opt-out of the pre-copy phase > entirely. Please also note that pre-copy is at the user's discretion, > we've defined that we can enter stop-and-copy at any point, including > without a pre-copy phase, so I would recommend that vendor drivers > validate compatibility at the start of both the pre-copy and the > stop-and-copy phases. > ok. got it! > > so, do you think we are now arriving at an agreement that we'll give up > > the read-and-test scheme and start to defining one interface (perhaps in > > json format), from which libvirt/openstack is able to parse and find out > > compatibility list of a source mdev/physical device? > > Based on the feedback we've received, the previously proposed interface > is not viable. I think there's agreement that the user needs to be > able to parse and interpret the version information. Using json seems > viable, but I don't know if it's the best option. Is there any > precedent of markup strings returned via sysfs we could follow? I found some examples of using formatted string under /sys, mostly under tracing. maybe we can do a similar implementation. #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format name: kvm_mmio ID: 32 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:u32 type; offset:8; size:4; signed:0; field:u32 len; offset:12; size:4; signed:0; field:u64 gpa; offset:16; size:8; signed:0; field:u64 val; offset:24; size:8; signed:0; print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" }, { 2, "write" }), REC->len, REC->gpa, REC->val #cat /sys/devices/pci0000:00/0000:00:02.0/uevent DRIVER=vfio-pci PCI_CLASS=30000 PCI_ID=8086:591D PCI_SUBSYS_ID=8086:2212 PCI_SLOT_NAME=0000:00:02.0 MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > Your idea of having both a "self" object and an array of "compatible" > objects is perhaps something we can build on, but we must not assume > PCI devices at the root level of the object. Providing both the > mdev-type and the driver is a bit redundant, since the former includes > the latter. We can't have vendor specific versioning schemes though, > ie. gvt-version. We need to agree on a common scheme and decide which > fields the version is relative to, ex. just the mdev type? what about making all comparing fields vendor specific? userspace like openstack only needs to parse and compare if target device is within source compatible list without understanding the meaning of each field. > I had also proposed fields that provide information to create a > compatible type, for example to create a type_x2 device from a type_x1 > mdev type, they need to know to apply an aggregation attribute. If we > need to explicitly list every aggregation value and the resulting type, > I think we run aground of what aggregation was trying to avoid anyway, > so we might need to pick a language that defines variable substitution > or some kind of tagging. For example if we could define ${aggr} as an > integer within a specified range, then we might be able to define a type > relative to that value (type_x${aggr}) which requires an aggregation > attribute using the same value. I dunno, just spit balling. Thanks, what about a migration_compatible attribute under device node like below? #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible SELF: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_2 aggregator=1 pv_mode="none+ppgtt+context" interface_version=3 COMPATIBLE: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} aggregator={val1}/2 pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} interface_version={val3:int:2,3} COMPATIBLE: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} aggregator={val1}/2 pv_mode="" #"" meaning empty, could be absent in a compatible device interface_version=1 #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID2/migration_compatible SELF: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_4 aggregator=2 interface_version=1 COMPATIBLE: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} aggregator={val1}/2 interface_version=1 Notes: - A COMPATIBLE object is a line starting with COMPATIBLE. It specifies a list of compatible devices that are allowed to migrate in. The reason to allow multiple COMPATIBLE objects is that when it is hard to express a complex compatible logic in one COMPATIBLE object, a simple enumeration is still a fallback. in the above example, device UUID2 is in the compatible list of device UUID1, but device UUID1 is not in the compatible list of device UUID2, so device UUID2 is able to migrate to device UUID1, but device UUID1 is not able to migrate to device UUID2. - fields under each object are of "and" relationship to each other, meaning all fields of SELF object of a target device must be equal to corresponding fields of a COMPATIBLE object of source device, otherwise it is regarded as not compatible. - each field, however, is able to specify multiple allowed values, using variables as explained below. - variables are represented with {}, the first appearance of one variable specifies its type and allowed list. e.g. {val1:int:1,2,4,8} represents var1 whose type is integer and allowed values are 1, 2, 4, 8. - vendors are able to specify which fields are within the comparing list and which fields are not. e.g. for physical VF migration, it may not choose mdev_type as a comparing field, and maybe use driver name instead. Thanks Yan From smooney at redhat.com Wed Jul 29 11:28:46 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 29 Jul 2020 12:28:46 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200729080503.GB28676@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> Message-ID: On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > On Mon, 27 Jul 2020 15:24:40 +0800 > > Yan Zhao wrote: > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > information embedded within the migration stream. Therefore a > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > get vendor specific compatibility checking string in migration setup stage > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > In this way, for devices who does not save device data in precopy stage, > > > > the migration compatibility checking is as late as in stop-and-copy > > > > stage, which is too late. > > > > do you think we need to add the getting/checking of vendor specific > > > > compatibility string early in save_setup stage? > > > > > > > > > > hi Alex, > > > after an offline discussion with Kevin, I realized that it may not be a > > > problem if migration compatibility check in vendor driver occurs late in > > > stop-and-copy phase for some devices, because if we report device > > > compatibility attributes clearly in an interface, the chances for > > > libvirt/openstack to make a wrong decision is little. > > > > I think it would be wise for a vendor driver to implement a pre-copy > > phase, even if only to send version information and verify it at the > > target. Deciding you have no device state to send during pre-copy does > > not mean your vendor driver needs to opt-out of the pre-copy phase > > entirely. Please also note that pre-copy is at the user's discretion, > > we've defined that we can enter stop-and-copy at any point, including > > without a pre-copy phase, so I would recommend that vendor drivers > > validate compatibility at the start of both the pre-copy and the > > stop-and-copy phases. > > > > ok. got it! > > > > so, do you think we are now arriving at an agreement that we'll give up > > > the read-and-test scheme and start to defining one interface (perhaps in > > > json format), from which libvirt/openstack is able to parse and find out > > > compatibility list of a source mdev/physical device? > > > > Based on the feedback we've received, the previously proposed interface > > is not viable. I think there's agreement that the user needs to be > > able to parse and interpret the version information. Using json seems > > viable, but I don't know if it's the best option. Is there any > > precedent of markup strings returned via sysfs we could follow? > > I found some examples of using formatted string under /sys, mostly under > tracing. maybe we can do a similar implementation. > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > name: kvm_mmio > ID: 32 > format: > field:unsigned short common_type; offset:0; size:2; signed:0; > field:unsigned char common_flags; offset:2; size:1; signed:0; > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:u32 type; offset:8; size:4; signed:0; > field:u32 len; offset:12; size:4; signed:0; > field:u64 gpa; offset:16; size:8; signed:0; > field:u64 val; offset:24; size:8; signed:0; > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" > }, { 2, "write" }), REC->len, REC->gpa, REC->val > this is not json fromat and its not supper frendly to parse. > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > DRIVER=vfio-pci > PCI_CLASS=30000 > PCI_ID=8086:591D > PCI_SUBSYS_ID=8086:2212 > PCI_SLOT_NAME=0000:00:02.0 > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > this is ini format or conf formant this is pretty simple to parse whichi would be fine. that said you could also have a version or capablitiy directory with a file for each key and a singel value. i would prefer to only have to do one read personally the list the files in directory and then read tehm all ot build the datastucture myself but that is doable though the simple ini format use d for uevent seams the best of 3 options provided above. > > > > Your idea of having both a "self" object and an array of "compatible" > > objects is perhaps something we can build on, but we must not assume > > PCI devices at the root level of the object. Providing both the > > mdev-type and the driver is a bit redundant, since the former includes > > the latter. We can't have vendor specific versioning schemes though, > > ie. gvt-version. We need to agree on a common scheme and decide which > > fields the version is relative to, ex. just the mdev type? > > what about making all comparing fields vendor specific? > userspace like openstack only needs to parse and compare if target > device is within source compatible list without understanding the meaning > of each field. that kind of defeats the reason for having them be be parsable. the reason openstack want to be able to understand the capablitys is so we can staticaly declare the capablit of devices ahead of time on so our schduler can select host based on that. is the keys and data are opaquce to userspace becaue they are just random vendor sepecific blobs we cant do that. > > > I had also proposed fields that provide information to create a > > compatible type, for example to create a type_x2 device from a type_x1 > > mdev type, they need to know to apply an aggregation attribute. If we > > need to explicitly list every aggregation value and the resulting type, > > I think we run aground of what aggregation was trying to avoid anyway, > > so we might need to pick a language that defines variable substitution > > or some kind of tagging. For example if we could define ${aggr} as an > > integer within a specified range, then we might be able to define a type > > relative to that value (type_x${aggr}) which requires an aggregation > > attribute using the same value. I dunno, just spit balling. Thanks, > > what about a migration_compatible attribute under device node like > below? rather then listing comaptiable devices it would be better if you could declaritivly list the feature supported and we could compare those along with a simple semver version string. > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > SELF: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_2 > aggregator=1 > pv_mode="none+ppgtt+context" > interface_version=3 > COMPATIBLE: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} this mixed notation will be hard to parse so i would avoid that. > aggregator={val1}/2 > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > interface_version={val3:int:2,3} > COMPATIBLE: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > aggregator={val1}/2 > pv_mode="" #"" meaning empty, could be absent in a compatible device > interface_version=1 if you presented this information the only way i could see to use it would be to extract the mdev_type name and interface_vertion and build a database table as follows source_mdev_type | source_version | target_mdev_type | target_version i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 this would either reuiqre use to use a post placment sechudler filter to itrospec this data base or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each device * the number of possible compatible devices for that device. in other word if this is just opaque data we cant ever represent it efficently in our placment service and have to fall back to an explisive post placment schdluer filter base on the db table approch. this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the CUSTOM_ trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that workflow. > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > SELF: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_4 > aggregator=2 > interface_version=1 > COMPATIBLE: > device_type=pci > device_id=8086591d > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > aggregator={val1}/2 > interface_version=1 by the way this is closer to yaml format then it is to json but it does not align with any exsiting format i know of so that just make the representation needless hard to consume if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > Notes: > - A COMPATIBLE object is a line starting with COMPATIBLE. > It specifies a list of compatible devices that are allowed to migrate > in. > The reason to allow multiple COMPATIBLE objects is that when it > is hard to express a complex compatible logic in one COMPATIBLE > object, a simple enumeration is still a fallback. > in the above example, device UUID2 is in the compatible list of > device UUID1, but device UUID1 is not in the compatible list of device > UUID2, so device UUID2 is able to migrate to device UUID1, but device > UUID1 is not able to migrate to device UUID2. > > - fields under each object are of "and" relationship to each other, meaning > all fields of SELF object of a target device must be equal to corresponding > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > compatible. > > - each field, however, is able to specify multiple allowed values, using > variables as explained below. > > - variables are represented with {}, the first appearance of one variable > specifies its type and allowed list. e.g. > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > values are 1, 2, 4, 8. > > - vendors are able to specify which fields are within the comparing list > and which fields are not. e.g. for physical VF migration, it may not > choose mdev_type as a comparing field, and maybe use driver name instead. this format might be useful to vendors but from a orcestrator perspecive i dont think this has value to us likely we would not use this api if it was added as it does not help us with schduling. ideally instead fo declaring which other mdev types a device is compatiable with (which could presumably change over time as new device and firmwares are released) i would prefer to see a declaritive non vendor specific api that declares the feature set provided by each mdev_type from which we can infer comaptiablity similar to cpu feature flags. for devices fo the same mdev_type name addtionally a declaritive version sting could also be used if required for addtional compatiablity checks. > > > Thanks > Yan > > From Sebastian.Saemann at netways.de Wed Jul 29 13:24:15 2020 From: Sebastian.Saemann at netways.de (Sebastian Saemann) Date: Wed, 29 Jul 2020 13:24:15 +0000 Subject: [neutron][networking-midonet] Maintainers needed Message-ID: <0AC5AC07-E97E-43CC-B344-A3E992B8CCA4@netways.de> Hi Slawek, we at NETWAYS are running most of our neutron networking on top of midonet and wouldn't be too happy if it gets deprecated and removed. So we would like to take over the maintainer role for this part. Please let me know how to proceed and how we can be onboarded easily. Best regards, Sebastian --  Sebastian Saemann Head of Managed Services NETWAYS Managed Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg Tel: +49 911 92885-0 | Fax: +49 911 92885-77 CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB25207 https://netways.de | sebastian.saemann at netways.de ** NETWAYS Web Services - https://nws.netways.de ** From dev.faz at gmail.com Wed Jul 29 14:24:01 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Wed, 29 Jul 2020 16:24:01 +0200 Subject: Looking for recommendation what OS to use for a minimal installation In-Reply-To: <000301d66568$4a80d690$df8283b0$@gmail.com> References: <000301d66568$4a80d690$df8283b0$@gmail.com> Message-ID: Hi, Eliezer Croitor schrieb am Mi., 29. Juli 2020, 15:30: > Hey Everybody, > > Any recommendation where to start? > Would suggest to use kolla-ansible, but this needs a bit container / ansible knowhow. > What OS to use? > CentOS seems fine, but im using Ubuntu. Should not be an issue at all if you use containers. Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Wed Jul 29 15:18:52 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 29 Jul 2020 12:18:52 -0300 Subject: [neutron] bandwidth metering based on remote address In-Reply-To: References: <2890841.xduM2AgYMW@antares> <25308951.foNqEPruJI@antares> Message-ID: Hello Jonas, I created the proposal for the extension concerning local IP addresses in metering label rules. You can find the proposal here: https://bugs.launchpad.net/neutron/+bug/1889431. I am starting today the implementation. On Thu, Jul 9, 2020 at 8:53 AM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > I created a bug track for the extension of the neutron metering > granularities: https://bugs.launchpad.net/neutron/+bug/1886949 > I am never sure about those "paper work", I normally propose the pull > requests, and wait for the guidance of the community. > > About the source/destination filtering, I have not published anything yet. > So far, we defined/specified what we need/want from the Neutron metering > sub-system. Next week I am supposed to start on this matter. Therefore, as > soon as I have updates, I will create the bug report, and pull requests. > You can help me now by reviewing the PR I already have open, and of course, > testing/using it :) > > On Thu, Jul 9, 2020 at 3:54 AM Jonas Schäfer < > jonas.schaefer at cloudandheat.com> wrote: > >> Hello Rafael, >> >> On Dienstag, 7. Juli 2020 14:09:29 CEST Rafael Weingärtner wrote: >> > Hallo Jonas, >> > I have worked to address this specific use case. >> > >> > First, the part of the solution that is already implemented. If you only >> > need to gather metrics in a tenant fashion, you can take a look into >> this >> > PR: https://review.opendev.org/#/c/735605/. That pull request enables >> > operators to configure shared traffic labels, and then, these traffic >> > labels will be exposed/published with different granularities. The >> > different granularities are router, tenant, label, router-label, and >> > tenant-label. The complete explanation can be found in the "RST" >> document >> > that the PR also introduces, where we wrote a complete description of >> > neutron metering, its configs, and usage. You are welcome to review and >> > help us get this PR merged :) >> >> This already looks very useful to us, since it saves us from creating >> labels >> for each and every project. >> >> > So far, if all you need is to measure the whole traffic, but in >> different >> > granularities, that PR will probably be enough. >> >> Not quite; as mentioned, we’ll need to carve out specific network areas >> from >> metering, those which are in our DCs, but on the other side of the router >> from >> the customer perspective. >> >> > On the other hand, if you >> > need to create more complex rules to filter by source/destination IPs, >> then >> > we need something else. Interestingly enough, we are working towards >> that. >> > We will extend neutron API, and neutron metering to allow operators to >> use >> > "remote-ip" and "source-ip" to create metering labels rules. >> >> That sounds exactly like what we’d need. >> >> > We also saw the PR that changed the behavior of the "remote-ip" >> property, >> > and the whole confusion it caused (at least for us). However, instead of >> > proposing to revert it, we are working towards enabling the API to >> handle >> > "remote-ip" and "source-ip", which will cover the use case of the person >> > that introduced that commit, and many others such as ours and yours >> > (probably). >> >> Sounds good. Is there a way we can collaborate on this? Is there a >> launchpad >> bug which tracks that? (Also, is there a launchpad thing for the shared >> label >> granularity you’re doing already? I didn’t find one mentioned on the >> gerrit >> page.) >> >> kind regards, >> Jonas Schäfer >> >> > >> > On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < >> > >> > jonas.schaefer at cloudandheat.com> wrote: >> > > Dear list, >> > > >> > > We are trying to implement tenant bandwidth metering at the neutron >> router >> > > level. Since some of the network spaces connected to the external >> > > interface of >> > > the neutron router are supposed to be unmetered, we need to match on >> the >> > > remote address. >> > > >> > > Conveniently, there exists a --remote-ip-prefix option on meter label >> > > create; >> > > however, since [1], its meaning was changed to the exact opposite: >> Instead >> > > of >> > > matching on the *remote* prefix (towards the external interface), it >> > > matches >> > > on the *local* prefix (towards the OS tenant network). >> > > >> > > In an ideal world, we would want to revert that change and instead >> > > introduce a >> > > --local-ip-prefix option which covers that use-case. I suppose this >> is not >> > > a >> > > thing we *should* do though, given that this change made it into a few >> > > releases already. >> > > >> > > Instead, we’ll have to create a new option (which whatever name) + >> > > associated >> > > database schema + iptables rule patterns to implement the feature. >> > > >> > > The questions associated with this are now: >> > > >> > > - Does this make absolutely no sense to anyone? >> > > - What is the process for this? I suppose since this change was made >> > > intentionally and passed review, our desired change needs to go >> through a >> > > feature request process (blueprints maybe?). >> > > >> > > kind regards, >> > > Jonas Schäfer >> > > >> > > [1]: https://opendev.org/openstack/neutron/commit/ >> > > >> > > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 >> > > >> > > >> > > -- >> > > Jonas Schäfer >> > > DevOps Engineer >> > > >> > > Cloud&Heat Technologies GmbH >> > > Königsbrücker Straße 96 | 01099 Dresden >> > > +49 351 479 367 37 >> > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com >> > > >> > > New Service: >> > > Managed Kubernetes designed for AI & ML >> > > https://managed-kubernetes.cloudandheat.com/ >> > > >> > > Commercial Register: District Court Dresden >> > > Register Number: HRB 30549 >> > > VAT ID No.: DE281093504 >> > > Managing Director: Nicolas Röhrs >> > > Authorized signatory: Dr. Marius Feldmann >> > > Authorized signatory: Kristina Rübenkamp >> >> >> -- >> Jonas Schäfer >> DevOps Engineer >> >> Cloud&Heat Technologies GmbH >> Königsbrücker Straße 96 | 01099 Dresden >> +49 351 479 367 37 >> jonas.schaefer at cloudandheat.com | www.cloudandheat.com >> >> New Service: >> Managed Kubernetes designed for AI & ML >> https://managed-kubernetes.cloudandheat.com/ >> >> Commercial Register: District Court Dresden >> Register Number: HRB 30549 >> VAT ID No.: DE281093504 >> Managing Director: Nicolas Röhrs >> Authorized signatory: Dr. Marius Feldmann >> Authorized signatory: Kristina Rübenkamp >> > > > -- > Rafael Weingärtner > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Wed Jul 29 16:01:47 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 29 Jul 2020 10:01:47 -0600 Subject: [tripleo] stable/train migration from CentOS-7 to CentOS-8 Message-ID: Greetings, I wanted to give everyone a heads up that the migration of upstream stable/train jobs from CentOS-7 to CentOS-8 [1] is close. CentOS-7 stable/train will continue to be tested in our 3rd party periodic pipeline and builds of CentOS-7 stable/train will be qualified here [2] Please let me know if you have any questions. Thanks! [1] https://review.opendev.org/#/c/738375 [2] https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable2-centos7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Wed Jul 29 16:27:03 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 29 Jul 2020 10:27:03 -0600 Subject: [tripleo] stable/train migration from CentOS-7 to CentOS-8 In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 10:01 AM Wesley Hayutin wrote: > Greetings, > > I wanted to give everyone a heads up that the migration of upstream > stable/train jobs from CentOS-7 to CentOS-8 [1] is close. CentOS-7 > stable/train will continue to be tested in our 3rd party periodic pipeline > and builds of CentOS-7 stable/train will be qualified here [2] > > Please let me know if you have any questions. > Thanks! > > > [1] https://review.opendev.org/#/c/738375 > [2] > https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable2-centos7 > > Few corrections now: See all the patches via https://review.opendev.org/#/q/topic:c7-to-c8-train+(status:open+OR+status:merged) I'll note we're going to maintain one centos-7 train job.. multinode-containers in upstream -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Wed Jul 29 18:37:21 2020 From: iurygregory at gmail.com (Iury Gregory) Date: Wed, 29 Jul 2020 20:37:21 +0200 Subject: [ironic] let's talk about grenade In-Reply-To: References: Message-ID: Hello everyone, Since we didn't get many responses I will keep the doodle open till Friday =) Em seg., 27 de jul. de 2020 às 17:55, Iury Gregory escreveu: > Hello everyone, > > I'm still on the fight to move our ironic-grenade-dsvm-multinode-multitenant > to zuulv3 [1], you can find some of my findings on the etherpad [2] under `Move > to Zuul v3 Jobs (Iurygregory)`. > > If you are interested in helping out we are going to schedule a meeting to > discuss about this, please use the doodle in [3]. I will close the doodle > on Wed July 29. > > Thanks! > > [1] https://review.opendev.org/705030 > [2] https://etherpad.openstack.org/p/IronicWhiteBoard > [3] https://doodle.com/poll/m69b5zwnsbgcysct > > -- > > > *Att[]'sIury Gregory Melo Ferreira * > *MSc in Computer Science at UFCG* > *Part of the puppet-manager-core team in OpenStack* > *Software Engineer at Red Hat Czech* > *Social*: https://www.linkedin.com/in/iurygregory > *E-mail: iurygregory at gmail.com * > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgilbert at redhat.com Wed Jul 29 19:05:41 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Wed, 29 Jul 2020 20:05:41 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200727162321.7097070e@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> Message-ID: <20200729190540.GK2795@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Mon, 27 Jul 2020 15:24:40 +0800 > Yan Zhao wrote: > > > > > As you indicate, the vendor driver is responsible for checking version > > > > information embedded within the migration stream. Therefore a > > > > migration should fail early if the devices are incompatible. Is it > > > but as I know, currently in VFIO migration protocol, we have no way to > > > get vendor specific compatibility checking string in migration setup stage > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > In this way, for devices who does not save device data in precopy stage, > > > the migration compatibility checking is as late as in stop-and-copy > > > stage, which is too late. > > > do you think we need to add the getting/checking of vendor specific > > > compatibility string early in save_setup stage? > > > > > hi Alex, > > after an offline discussion with Kevin, I realized that it may not be a > > problem if migration compatibility check in vendor driver occurs late in > > stop-and-copy phase for some devices, because if we report device > > compatibility attributes clearly in an interface, the chances for > > libvirt/openstack to make a wrong decision is little. > > I think it would be wise for a vendor driver to implement a pre-copy > phase, even if only to send version information and verify it at the > target. Deciding you have no device state to send during pre-copy does > not mean your vendor driver needs to opt-out of the pre-copy phase > entirely. Please also note that pre-copy is at the user's discretion, > we've defined that we can enter stop-and-copy at any point, including > without a pre-copy phase, so I would recommend that vendor drivers > validate compatibility at the start of both the pre-copy and the > stop-and-copy phases. That's quite curious; from a migration point of view I'd expect if you did want to skip pre-copy, that you'd go through the motions of entering it and then not saving any data and then going to stop-and-copy, rather than having two flows. Note that failing at a late stage of stop-and-copy is a pain; if you've just spent an hour migrating your huge busy VM over, you're going to be pretty annoyed when it goes pop near the end. Dave > > so, do you think we are now arriving at an agreement that we'll give up > > the read-and-test scheme and start to defining one interface (perhaps in > > json format), from which libvirt/openstack is able to parse and find out > > compatibility list of a source mdev/physical device? > > Based on the feedback we've received, the previously proposed interface > is not viable. I think there's agreement that the user needs to be > able to parse and interpret the version information. Using json seems > viable, but I don't know if it's the best option. Is there any > precedent of markup strings returned via sysfs we could follow? > > Your idea of having both a "self" object and an array of "compatible" > objects is perhaps something we can build on, but we must not assume > PCI devices at the root level of the object. Providing both the > mdev-type and the driver is a bit redundant, since the former includes > the latter. We can't have vendor specific versioning schemes though, > ie. gvt-version. We need to agree on a common scheme and decide which > fields the version is relative to, ex. just the mdev type? > > I had also proposed fields that provide information to create a > compatible type, for example to create a type_x2 device from a type_x1 > mdev type, they need to know to apply an aggregation attribute. If we > need to explicitly list every aggregation value and the resulting type, > I think we run aground of what aggregation was trying to avoid anyway, > so we might need to pick a language that defines variable substitution > or some kind of tagging. For example if we could define ${aggr} as an > integer within a specified range, then we might be able to define a type > relative to that value (type_x${aggr}) which requires an aggregation > attribute using the same value. I dunno, just spit balling. Thanks, > > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From rosmaita.fossdev at gmail.com Wed Jul 29 20:28:54 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 29 Jul 2020 16:28:54 -0400 Subject: [cinder] video + IRC meeting survey Message-ID: Today we had our first Cinder meeting held simultaneously in videoconference and on IRC. I have posted a quick survey to see if there's support for continuing this monthly experiment. (The next video + IRC meeting would be the last week of August.) https://rosmaita.wufoo.com/forms/r1twmaya13asv1c/ Please respond before 12:00 UTC on Tuesday, 4 August. thanks! brian From gouthampravi at gmail.com Wed Jul 29 21:33:04 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 29 Jul 2020 14:33:04 -0700 Subject: [manila] Please delete some branches Message-ID: Hello, I'd like to request the deletion of some branches in manila that have now transitioned to EOL. These branches can be removed from openstack/manila, openstack/python-manilaclient and openstack/manila-ui: stable/pike stable/ocata I'd also like to request the deletion of "driverfixes" branches from the openstack/manila repository. These branches were created to host vendor fixes to branches that were no longer being tested; however, with our "extended maintenance" stance, we've effectively removed the need for these branches. These branches will no longer be maintained, and so they can be removed as well: driverfixes/mitaka driverfixes/newton driverfixes/ocata Thank you so much for your assistance! Goutham Pacha Ravi -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Wed Jul 29 21:39:55 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 29 Jul 2020 14:39:55 -0700 Subject: [manila][infra] Please delete some branches (was [manila] Please delete some branches) In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 2:33 PM Goutham Pacha Ravi wrote: > Hello, > > I'd like to request the deletion of some branches in manila that have now > transitioned to EOL. These branches can be removed from openstack/manila, > openstack/python-manilaclient and openstack/manila-ui: > > stable/pike > stable/ocata > > I'd also like to request the deletion of "driverfixes" branches from the > openstack/manila repository. These branches were created to host vendor > fixes to branches that were no longer being tested; however, with our > "extended maintenance" stance, we've effectively removed the need for these > branches. These branches will no longer be maintained, and so they can be > removed as well: > > driverfixes/mitaka > driverfixes/newton > driverfixes/ocata > > > Thank you so much for your assistance! > I apologize this email was sent without the [infra] subject line tag, because it was initially sent to openstack-infra at lists.openstack.org, which is now closed. > > Goutham Pacha Ravi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Wed Jul 29 22:33:14 2020 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 29 Jul 2020 16:33:14 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin wrote: > > > > On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya wrote: >> >> On 7/28/20 6:09 PM, Wesley Hayutin wrote: >> > >> > >> > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi > > > wrote: >> > >> > >> > >> > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz > > > wrote: >> > >> > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi >> > > wrote: >> > > >> > > >> > > >> > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin >> > > wrote: >> > >> >> > >> FYI... >> > >> >> > >> If you find your jobs are failing with an error similar to >> > [1], you have been rate limited by docker.io >> > via the upstream mirror system and have hit [2]. I've been >> > discussing the issue w/ upstream infra, rdo-infra and a few CI >> > engineers. >> > >> >> > >> There are a few ways to mitigate the issue however I don't >> > see any of the options being completed very quickly so I'm >> > asking for your patience while this issue is socialized and >> > resolved. >> > >> >> > >> For full transparency we're considering the following options. >> > >> >> > >> 1. move off of docker.io to quay.io >> > >> > > >> > > >> > > quay.io also has API rate limit: >> > > https://docs.quay.io/issues/429.html >> > > >> > > Now I'm not sure about how many requests per seconds one can >> > do vs the other but this would need to be checked with the quay >> > team before changing anything. >> > > Also quay.io had its big downtimes as well, >> > SLA needs to be considered. >> > > >> > >> 2. local container builds for each job in master, possibly >> > ussuri >> > > >> > > >> > > Not convinced. >> > > You can look at CI logs: >> > > - pulling / updating / pushing container images from >> > docker.io to local registry takes ~10 min on >> > standalone (OVH) >> > > - building containers from scratch with updated repos and >> > pushing them to local registry takes ~29 min on standalone (OVH). >> > > >> > >> >> > >> 3. parent child jobs upstream where rpms and containers will >> > be build and host artifacts for the child jobs >> > > >> > > >> > > Yes, we need to investigate that. >> > > >> > >> >> > >> 4. remove some portion of the upstream jobs to lower the >> > impact we have on 3rd party infrastructure. >> > > >> > > >> > > I'm not sure I understand this one, maybe you can give an >> > example of what could be removed? >> > >> > We need to re-evaulate our use of scenarios (e.g. we have two >> > scenario010's both are non-voting). There's a reason we >> > historically >> > didn't want to add more jobs because of these types of resource >> > constraints. I think we've added new jobs recently and likely >> > need to >> > reduce what we run. Additionally we might want to look into reducing >> > what we run on stable branches as well. >> > >> > >> > Oh... removing jobs (I thought we would remove some steps of the jobs). >> > Yes big +1, this should be a continuous goal when working on CI, and >> > always evaluating what we need vs what we run now. >> > >> > We should look at: >> > 1) services deployed in scenarios that aren't worth testing (e.g. >> > deprecated or unused things) (and deprecate the unused things) >> > 2) jobs themselves (I don't have any example beside scenario010 but >> > I'm sure there are more). >> > -- >> > Emilien Macchi >> > >> > >> > Thanks Alex, Emilien >> > >> > +1 to reviewing the catalog and adjusting things on an ongoing basis. >> > >> > All.. it looks like the issues with docker.io were >> > more of a flare up than a change in docker.io policy >> > or infrastructure [2]. The flare up started on July 27 8am utc and >> > ended on July 27 17:00 utc, see screenshots. >> >> The numbers of image prepare workers and its exponential fallback >> intervals should be also adjusted. I've analysed the log snippet [0] for >> the connection reset counts by workers versus the times the rate >> limiting was triggered. See the details in the reported bug [1]. >> >> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: >> >> Conn Reset Counts by a Worker PID: >> 3 58412 >> 2 58413 >> 3 58415 >> 3 58417 >> >> which seems too much of (workers*reconnects) and triggers rate limiting >> immediately. >> >> [0] >> https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Irc #bogdando >> > > FYI.. > > The issue w/ "too many requests" is back. Expect delays and failures in attempting to merge your patches upstream across all branches. The issue is being tracked as a critical issue. Working with the infra folks and we have identified the authorization header as causing issues when we're rediected from docker.io to cloudflare. I'll throw up a patch tomorrow to handle this case which should improve our usage of the cache. It needs some testing against other registries to ensure that we don't break authenticated fetching of resources. From whayutin at redhat.com Wed Jul 29 22:48:52 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 29 Jul 2020 16:48:52 -0600 Subject: [tripleo][ci] container pulls failing In-Reply-To: References: Message-ID: On Wed, Jul 29, 2020 at 4:33 PM Alex Schultz wrote: > On Wed, Jul 29, 2020 at 7:13 AM Wesley Hayutin > wrote: > > > > > > > > On Wed, Jul 29, 2020 at 2:25 AM Bogdan Dobrelya > wrote: > >> > >> On 7/28/20 6:09 PM, Wesley Hayutin wrote: > >> > > >> > > >> > On Tue, Jul 28, 2020 at 7:24 AM Emilien Macchi >> > > wrote: > >> > > >> > > >> > > >> > On Tue, Jul 28, 2020 at 9:20 AM Alex Schultz >> > > wrote: > >> > > >> > On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi > >> > > wrote: > >> > > > >> > > > >> > > > >> > > On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin > >> > > wrote: > >> > >> > >> > >> FYI... > >> > >> > >> > >> If you find your jobs are failing with an error similar to > >> > [1], you have been rate limited by docker.io < > http://docker.io> > >> > via the upstream mirror system and have hit [2]. I've been > >> > discussing the issue w/ upstream infra, rdo-infra and a few CI > >> > engineers. > >> > >> > >> > >> There are a few ways to mitigate the issue however I don't > >> > see any of the options being completed very quickly so I'm > >> > asking for your patience while this issue is socialized and > >> > resolved. > >> > >> > >> > >> For full transparency we're considering the following > options. > >> > >> > >> > >> 1. move off of docker.io to quay.io > >> > > >> > > > >> > > > >> > > quay.io also has API rate limit: > >> > > https://docs.quay.io/issues/429.html > >> > > > >> > > Now I'm not sure about how many requests per seconds one > can > >> > do vs the other but this would need to be checked with the > quay > >> > team before changing anything. > >> > > Also quay.io had its big downtimes as > well, > >> > SLA needs to be considered. > >> > > > >> > >> 2. local container builds for each job in master, possibly > >> > ussuri > >> > > > >> > > > >> > > Not convinced. > >> > > You can look at CI logs: > >> > > - pulling / updating / pushing container images from > >> > docker.io to local registry takes ~10 min > on > >> > standalone (OVH) > >> > > - building containers from scratch with updated repos and > >> > pushing them to local registry takes ~29 min on standalone > (OVH). > >> > > > >> > >> > >> > >> 3. parent child jobs upstream where rpms and containers > will > >> > be build and host artifacts for the child jobs > >> > > > >> > > > >> > > Yes, we need to investigate that. > >> > > > >> > >> > >> > >> 4. remove some portion of the upstream jobs to lower the > >> > impact we have on 3rd party infrastructure. > >> > > > >> > > > >> > > I'm not sure I understand this one, maybe you can give an > >> > example of what could be removed? > >> > > >> > We need to re-evaulate our use of scenarios (e.g. we have two > >> > scenario010's both are non-voting). There's a reason we > >> > historically > >> > didn't want to add more jobs because of these types of > resource > >> > constraints. I think we've added new jobs recently and likely > >> > need to > >> > reduce what we run. Additionally we might want to look into > reducing > >> > what we run on stable branches as well. > >> > > >> > > >> > Oh... removing jobs (I thought we would remove some steps of the > jobs). > >> > Yes big +1, this should be a continuous goal when working on CI, > and > >> > always evaluating what we need vs what we run now. > >> > > >> > We should look at: > >> > 1) services deployed in scenarios that aren't worth testing (e.g. > >> > deprecated or unused things) (and deprecate the unused things) > >> > 2) jobs themselves (I don't have any example beside scenario010 > but > >> > I'm sure there are more). > >> > -- > >> > Emilien Macchi > >> > > >> > > >> > Thanks Alex, Emilien > >> > > >> > +1 to reviewing the catalog and adjusting things on an ongoing basis. > >> > > >> > All.. it looks like the issues with docker.io were > >> > more of a flare up than a change in docker.io > policy > >> > or infrastructure [2]. The flare up started on July 27 8am utc and > >> > ended on July 27 17:00 utc, see screenshots. > >> > >> The numbers of image prepare workers and its exponential fallback > >> intervals should be also adjusted. I've analysed the log snippet [0] for > >> the connection reset counts by workers versus the times the rate > >> limiting was triggered. See the details in the reported bug [1]. > >> > >> tl;dr -- for an example 5 sec interval 03:55:31,379 - 03:55:36,110: > >> > >> Conn Reset Counts by a Worker PID: > >> 3 58412 > >> 2 58413 > >> 3 58415 > >> 3 58417 > >> > >> which seems too much of (workers*reconnects) and triggers rate limiting > >> immediately. > >> > >> [0] > >> > https://13b475d7469ed7126ee9-28d4ad440f46f2186fe3f98464e57890.ssl.cf1.rackcdn.com/741228/6/check/tripleo-ci-centos-8-undercloud-containers/8e47836/logs/undercloud/var/log/tripleo-container-image-prepare.log > >> > >> [1] https://bugs.launchpad.net/tripleo/+bug/1889372 > >> > >> -- > >> Best regards, > >> Bogdan Dobrelya, > >> Irc #bogdando > >> > > > > FYI.. > > > > The issue w/ "too many requests" is back. Expect delays and failures in > attempting to merge your patches upstream across all branches. The issue > is being tracked as a critical issue. > > Working with the infra folks and we have identified the authorization > header as causing issues when we're rediected from docker.io to > cloudflare. I'll throw up a patch tomorrow to handle this case which > should improve our usage of the cache. It needs some testing against > other registries to ensure that we don't break authenticated fetching > of resources. > > Thanks Alex! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackmin at mellanox.com Thu Jul 30 07:06:37 2020 From: jackmin at mellanox.com (Xiaoyu Min) Date: Thu, 30 Jul 2020 15:06:37 +0800 Subject: [ironic] ironic-python-agent failed to lookup node Message-ID: <20200730070637.4jxznouoricamabe@mellanox.com> Hello Experts: I'm new to the openstack and trying to deploy one baremental node but it failed on "wait call-back" stage. I checked IPA log on deploy image and it reports the following error [1]. I aslo manually run the command [2] it gave me the same error "404". I used the devstack to depoly openstack (stable/ussuri). What's the possible problem? Any suggestion is appreciated. -Jack [1]: DEBUG ironic_python_agent.ironic_api_client [-] Looking up node with addresses 'e4:43:4b:93:4c:12,e4:43:4b:93:4c:10,0c:42:a1:56:b8:c5,e4:43:4b:93:4c:13,e4:43:4b:93:4c:11,0c:42:a1:56:b8:c4' and UUID None at http://10.75.205.241/baremetal _do_lookup /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:163[00m ERROR ironic_python_agent.ironic_api_client [-] An error occurred while attempting to discover the available Ironic API versions, falling back to using version 1.31: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ERROR ironic_python_agent.ironic_api_client Traceback (most recent call last): ERROR ironic_python_agent.ironic_api_client File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 97, in _get_ironic_api_version ERROR ironic_python_agent.ironic_api_client data = jsonutils.loads(response.content) ERROR ironic_python_agent.ironic_api_client File "/usr/lib/python3.6/site-packages/oslo_serialization/jsonutils.py", line 248, in loads ERROR ironic_python_agent.ironic_api_client return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) ERROR ironic_python_agent.ironic_api_client File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads ERROR ironic_python_agent.ironic_api_client return _default_decoder.decode(s) ERROR ironic_python_agent.ironic_api_client File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode ERROR ironic_python_agent.ironic_api_client obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ERROR ironic_python_agent.ironic_api_client File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode ERROR ironic_python_agent.ironic_api_client raise JSONDecodeError("Expecting value", s, err.value) from None ERROR ironic_python_agent.ironic_api_client json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ERROR ironic_python_agent.ironic_api_client [00m WARNING ironic_python_agent.ironic_api_client [-] Failed looking up node with addresses 'e4:43:4b:93:4c:12,e4:43:4b:93:4c:10,0c:42:a1:56:b8:c5,e4:43:4b:93:4c:13,e4:43:4b:93:4c:11,0c:42:a1:56:b8:c4' at http://10.75.205.241/baremetal, status code: 404[00m]]] [2]: curl http://10.75.205.241/baremetal [...snip...] 404 Not Found From e0ne at e0ne.info Thu Jul 30 10:00:03 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 30 Jul 2020 13:00:03 +0300 Subject: [horizon] Victoria virtual mid-cycle poll In-Reply-To: References: Message-ID: Hi team, If something can go wrong, it will definitely go wrong. It means that I did a mistake in my original mail and sent you completely wrong dates:(. Horizon Virtual mid-cycle is supposed to be next week Aug 5-7. I'm planning to have a single one-hour session. In case, if we've got a lot of participants and topic to discuss, we can schedule one more session a week or two weeks later. Here is a correct poll: https://doodle.com/poll/dkmsai49v4zzpca2 Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Wed, Jul 22, 2020 at 10:26 AM Ivan Kolodyazhny wrote: > Hi team, > > As discussed at Horizon's Virtual PTG [1], we'll have a virtual mid-cycle > meeting around Victoria-2 milestone. > > We'll discuss Horizon current cycle development priorities and the future > of Horizon with modern JS frameworks. > > Please indicate your availability to meet for the first session, which > will be held during the week of July 27-31: > > https://doodle.com/poll/3neps94amcreaw8q > > Please respond before 12:00 UTC on Tuesday 4 August. > > [1] https://etherpad.opendev.org/p/horizon-v-ptg > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jul 30 11:13:14 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 30 Jul 2020 13:13:14 +0200 Subject: [neutron] Drivers meeting 31.07.2020 - agenda Message-ID: Hi, For tomorrow’s drivers meeting we have couple of items to discuss. Full agenda is at https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda but below is short summary also. RFEs: https://bugs.launchpad.net/neutron/+bug/1889431 - [RFE] Add local-ip-prefix to Neutron metering label rules https://bugs.launchpad.net/neutron/+bug/1880532 - [RFE]L3 Router should support ECMP This one was already discussed in the past but now we know it can be done (almost) without changes in existing API https://bugs.launchpad.net/neutron/+bug/1888487 - Change the default value of "propagate_uplink_status" to True We also have 2 items added by Rodolfo to the "On demand” section: • (ralonsoh) https://bugs.launchpad.net/neutron/+bug/1887523: a good proposal to improve the DB performance, that could be done in parallel to https://review.opendev.org/#/c/739139/ • (ralonsoh) https://bugs.launchpad.net/neutron/+bug/1888829: Improve core plugin extension filtering using the mechanism driver information (in other words: if loaded mech drivers do not support any ML2plugin extension, this extension will be gracefully removed). — Slawek Kaplonski Principal software engineer Red Hat From mihalis68 at gmail.com Thu Jul 30 12:52:15 2020 From: mihalis68 at gmail.com (Chris Morgan) Date: Thu, 30 Jul 2020 08:52:15 -0400 Subject: [ops] Reviving OSOps ? In-Reply-To: References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> Message-ID: +1 to put these in the Operations Docs SIG On Wed, Jul 29, 2020 at 12:25 AM Fabian Zimmermann wrote: > +1 > > Laurent Dumont schrieb am Mi., 29. Juli 2020, > 04:00: > >> Interested in this as well. We use Openstack a $Dayjob :) >> >> On Mon, Jul 27, 2020 at 2:52 PM Amy Marrich wrote: >> >>> +1 on combining this in with the existing SiG and efforts. >>> >>> Amy (spotz) >>> >>> On Mon, Jul 27, 2020 at 1:02 PM Sean McGinnis >>> wrote: >>> >>>> >>>> >> If Osops should be considered distinct from OpenStack >>>> > >>>> > That feels like the wrong statement to make, even if only implicitly >>>> > by repo organization. Is there a compelling reason not to have osops >>>> > under the openstack namespace? >>>> > >>>> I think it makes the most sense to be under the openstack namespace. >>>> >>>> We have the Operations Docs SIG right now that took on some of the >>>> operator-specific documentation that no longer had a home. This was a >>>> consistent issue brought up in the Ops Meetup events. While not "wildly >>>> successful" in getting a bunch of new and updated docs, it at least has >>>> accomplished the main goal of getting these docs published to >>>> docs.openstack.org again, and providing a place where more >>>> collaboration >>>> can (and occasionally does) happen to improve those docs. >>>> >>>> I think we could probably expand the scope of this SIG. Especially >>>> considering it is a pretty low-volume SIG anyway. I would be good with >>>> changing this to something like the "Operator Docs and Tooling SIG" and >>>> getting any of these useful tooling repos under governance through that. >>>> I personally wouldn't be able to spend a lot of time working on anything >>>> under the SIG, but I'd be happy to keep an eye out for any new reviews >>>> and help get those through. >>>> >>>> Sean >>>> >>>> >>>> -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Jul 30 13:05:00 2020 From: amy at demarco.com (Amy Marrich) Date: Thu, 30 Jul 2020 08:05:00 -0500 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Adding the discuss list where you might get more help, but also double check your config file for any extra spaces or typos. Thanks, Amy (spotz) On Thu, Jul 30, 2020 at 6:30 AM Monika Samal wrote: > Hello All, > > I have been following > https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html > to deploy *Octavia. *I was successful in deployin Octavia but when I go > to horizon dashboard and create loadbalancer am getting error *"9876/v2.0/lbaas/loadbalancers, > Internal Server Error". *I have checked worker log* at > /var/log/kola-ansible/Octavia-worker.log *and found oslo messaging was > refusing connection I fixed it but still getting same error. Kindly help > > Regards, > Moni > _______________________________________________ > Community mailing list > Community at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/community > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Thu Jul 30 13:21:43 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 30 Jul 2020 08:21:43 -0500 Subject: [cloudkitty][tc] Cloudkitty abandoned? Message-ID: Posting here to raise awareness, and start discussion about next steps. It appears there is no one working on Cloudkitty anymore. No patches have been merged for several months now, including simple bot proposed patches. It would appear no one is maintaining this project anymore. I know there is a need out there for this type of functionality, so maybe this will raise awareness and get some attention to it. But barring that, I am wondering if we should start the process to retire this project. From a Victoria release perspective, it is milestone-2 week, so we should make a decision if any of the Cloudkitty deliverables should be included in this release or not. We can certainly force releases of whatever is the latest, but I think that is a bit risky since these repos have never merged the job template change for victoria and therefore are not even testing with Python 3.8. That is an official runtime for Victoria, so we run the risk of having issues with the code if someone runs under 3.8 but we have not tested to make sure there are no problems doing so. I am hoping this at least starts the discussion. I will not propose any release patches to remove anything until we have had a chance to discuss the situation. Sean From rafaelweingartner at gmail.com Thu Jul 30 13:42:00 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 30 Jul 2020 10:42:00 -0300 Subject: [cloudkitty][tc] Cloudkitty abandoned? In-Reply-To: References: Message-ID: We are working on it. So far we have 3 open proposals there, but we do not have enough karma to move things along. Besides these 3 open proposals, we do have more ongoing extensions that have not yet been proposed to the community. On Thu, Jul 30, 2020 at 10:22 AM Sean McGinnis wrote: > Posting here to raise awareness, and start discussion about next steps. > > It appears there is no one working on Cloudkitty anymore. No patches > have been merged for several months now, including simple bot proposed > patches. It would appear no one is maintaining this project anymore. > > I know there is a need out there for this type of functionality, so > maybe this will raise awareness and get some attention to it. But > barring that, I am wondering if we should start the process to retire > this project. > > From a Victoria release perspective, it is milestone-2 week, so we > should make a decision if any of the Cloudkitty deliverables should be > included in this release or not. We can certainly force releases of > whatever is the latest, but I think that is a bit risky since these > repos have never merged the job template change for victoria and > therefore are not even testing with Python 3.8. That is an official > runtime for Victoria, so we run the risk of having issues with the code > if someone runs under 3.8 but we have not tested to make sure there are > no problems doing so. > > I am hoping this at least starts the discussion. I will not propose any > release patches to remove anything until we have had a chance to discuss > the situation. > > Sean > > > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu Jul 30 15:16:47 2020 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 30 Jul 2020 10:16:47 -0500 Subject: [neutron] Drivers meeting 31.07.2020 - agenda In-Reply-To: References: Message-ID: Slawek, I might not be able to attend this time around Cheers On Thu, Jul 30, 2020 at 6:13 AM Slawek Kaplonski wrote: > Hi, > > For tomorrow’s drivers meeting we have couple of items to discuss. > Full agenda is at > https://wiki.openstack.org/wiki/Meetings/NeutronDrivers#Agenda but below > is short summary also. > > RFEs: > https://bugs.launchpad.net/neutron/+bug/1889431 - [RFE] Add > local-ip-prefix to Neutron metering label rules > > https://bugs.launchpad.net/neutron/+bug/1880532 - [RFE]L3 Router should > support ECMP > This one was already discussed in the past but now we know it can > be done (almost) without changes in existing API > > https://bugs.launchpad.net/neutron/+bug/1888487 - Change the default > value of "propagate_uplink_status" to True > > We also have 2 items added by Rodolfo to the "On demand” section: > • (ralonsoh) https://bugs.launchpad.net/neutron/+bug/1887523: a > good proposal to improve the DB performance, that could be done in parallel > to https://review.opendev.org/#/c/739139/ > • (ralonsoh) https://bugs.launchpad.net/neutron/+bug/1888829: > Improve core plugin extension filtering using the mechanism driver > information (in other words: if loaded mech drivers do not support any > ML2plugin extension, this extension will be gracefully removed). > > — > Slawek Kaplonski > Principal software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Jul 30 15:25:13 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 30 Jul 2020 20:55:13 +0530 Subject: [Glance] Proposing Dan Smith for glance core Message-ID: Hi All, I'd like to propose adding Dan Smith to the glance core group. Dan Smith has contributed to stabilize image import workflow as well as multiple stores of glance. He is also contributing in tempest and nova to set up CI/tempest jobs around image import and multiple stores. Being involved on the mailing-list and IRC channels, Dan is always helpful to the community and here to help. Please respond with +1/-1 until 03rd August, 2020 1400 UTC. Cheers, Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Thu Jul 30 15:41:49 2020 From: emilien at redhat.com (Emilien Macchi) Date: Thu, 30 Jul 2020 11:41:49 -0400 Subject: [tripleo] manual about TripleO core reviewers Message-ID: Hi people, I've started a documentation patch about our core reviewers. Core or not, please be involved in Gerrit directly. https://review.opendev.org/#/c/743999 Hopefully it'll help people to understand about expectations but also give a refresh to current cores. Note: this was inspired by https://wiki.openstack.org/wiki/Ironic/CoreTeam - Thanks a lot to the Ironic team! Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.williamson at redhat.com Wed Jul 29 19:12:55 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Wed, 29 Jul 2020 13:12:55 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> Message-ID: <20200729131255.68730f68@x1.home> On Wed, 29 Jul 2020 12:28:46 +0100 Sean Mooney wrote: > On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > On Mon, 27 Jul 2020 15:24:40 +0800 > > > Yan Zhao wrote: > > > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > > information embedded within the migration stream. Therefore a > > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > > get vendor specific compatibility checking string in migration setup stage > > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > > In this way, for devices who does not save device data in precopy stage, > > > > > the migration compatibility checking is as late as in stop-and-copy > > > > > stage, which is too late. > > > > > do you think we need to add the getting/checking of vendor specific > > > > > compatibility string early in save_setup stage? > > > > > > > > > > > > > hi Alex, > > > > after an offline discussion with Kevin, I realized that it may not be a > > > > problem if migration compatibility check in vendor driver occurs late in > > > > stop-and-copy phase for some devices, because if we report device > > > > compatibility attributes clearly in an interface, the chances for > > > > libvirt/openstack to make a wrong decision is little. > > > > > > I think it would be wise for a vendor driver to implement a pre-copy > > > phase, even if only to send version information and verify it at the > > > target. Deciding you have no device state to send during pre-copy does > > > not mean your vendor driver needs to opt-out of the pre-copy phase > > > entirely. Please also note that pre-copy is at the user's discretion, > > > we've defined that we can enter stop-and-copy at any point, including > > > without a pre-copy phase, so I would recommend that vendor drivers > > > validate compatibility at the start of both the pre-copy and the > > > stop-and-copy phases. > > > > > > > ok. got it! > > > > > > so, do you think we are now arriving at an agreement that we'll give up > > > > the read-and-test scheme and start to defining one interface (perhaps in > > > > json format), from which libvirt/openstack is able to parse and find out > > > > compatibility list of a source mdev/physical device? > > > > > > Based on the feedback we've received, the previously proposed interface > > > is not viable. I think there's agreement that the user needs to be > > > able to parse and interpret the version information. Using json seems > > > viable, but I don't know if it's the best option. Is there any > > > precedent of markup strings returned via sysfs we could follow? > > > > I found some examples of using formatted string under /sys, mostly under > > tracing. maybe we can do a similar implementation. > > > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > > > name: kvm_mmio > > ID: 32 > > format: > > field:unsigned short common_type; offset:0; size:2; signed:0; > > field:unsigned char common_flags; offset:2; size:1; signed:0; > > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > > field:int common_pid; offset:4; size:4; signed:1; > > > > field:u32 type; offset:8; size:4; signed:0; > > field:u32 len; offset:12; size:4; signed:0; > > field:u64 gpa; offset:16; size:8; signed:0; > > field:u64 val; offset:24; size:8; signed:0; > > > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" > > }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > this is not json fromat and its not supper frendly to parse. > > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > > DRIVER=vfio-pci > > PCI_CLASS=30000 > > PCI_ID=8086:591D > > PCI_SUBSYS_ID=8086:2212 > > PCI_SLOT_NAME=0000:00:02.0 > > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > > this is ini format or conf formant > this is pretty simple to parse whichi would be fine. > that said you could also have a version or capablitiy directory with a file > for each key and a singel value. > > i would prefer to only have to do one read personally the list the files in > directory and then read tehm all ot build the datastucture myself but that is > doable though the simple ini format use d for uevent seams the best of 3 options > provided above. > > > > > > Your idea of having both a "self" object and an array of "compatible" > > > objects is perhaps something we can build on, but we must not assume > > > PCI devices at the root level of the object. Providing both the > > > mdev-type and the driver is a bit redundant, since the former includes > > > the latter. We can't have vendor specific versioning schemes though, > > > ie. gvt-version. We need to agree on a common scheme and decide which > > > fields the version is relative to, ex. just the mdev type? > > > > what about making all comparing fields vendor specific? > > userspace like openstack only needs to parse and compare if target > > device is within source compatible list without understanding the meaning > > of each field. > that kind of defeats the reason for having them be be parsable. > the reason openstack want to be able to understand the capablitys is so > we can staticaly declare the capablit of devices ahead of time on so our schduler > can select host based on that. is the keys and data are opaquce to userspace > becaue they are just random vendor sepecific blobs we cant do that. Agreed, I'm not sure I'm willing to rule out that there could be vendor specific direct match fields, as I included in my example earlier in the thread, but entirely vendor specific defeats much of the purpose here. > > > I had also proposed fields that provide information to create a > > > compatible type, for example to create a type_x2 device from a type_x1 > > > mdev type, they need to know to apply an aggregation attribute. If we > > > need to explicitly list every aggregation value and the resulting type, > > > I think we run aground of what aggregation was trying to avoid anyway, > > > so we might need to pick a language that defines variable substitution > > > or some kind of tagging. For example if we could define ${aggr} as an > > > integer within a specified range, then we might be able to define a type > > > relative to that value (type_x${aggr}) which requires an aggregation > > > attribute using the same value. I dunno, just spit balling. Thanks, > > > > what about a migration_compatible attribute under device node like > > below? > rather then listing comaptiable devices it would be better if you could declaritivly > list the feature supported and we could compare those along with a simple semver version string. > > > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible Note that we're defining compatibility relative to a vfio migration interface, so we should include that in the name, we don't know what other migration interfaces might exist. > > SELF: > > device_type=pci Why not the device_api here, ie. vfio-pci. The device doesn't provide a pci interface directly, it's wrapped in a vfio API. > > device_id=8086591d Is device_id interpreted relative to device_type? How does this relate to mdev_type? If we have an mdev_type, doesn't that fully defined the software API? > > mdev_type=i915-GVTg_V5_2 And how are non-mdev devices represented? > > aggregator=1 > > pv_mode="none+ppgtt+context" These are meaningless vendor specific matches afaict. > > interface_version=3 Not much granularity here, I prefer Sean's previous .[.bugfix] scheme. > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > this mixed notation will be hard to parse so i would avoid that. Some background, Intel has been proposing aggregation as a solution to how we scale mdev devices when hardware exposes large numbers of assignable objects that can be composed in essentially arbitrary ways. So for instance, if we have a workqueue (wq), we might have an mdev type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a discrete mdev type for each of those, so they want to define a base type which is composable to other types via this aggregation. This is what this substitution and tagging is attempting to accomplish. So imagine this set of values for cases where it's not practical to unroll the values for N discrete types. > > aggregator={val1}/2 So the {val1} above would be substituted here, though an aggregation factor of 1/2 is a head scratcher... > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} I'm lost on this one though. I think maybe it's indicating that it's compatible with any of these, so do we need to list it? Couldn't this be handled by Sean's version proposal where the minor version represents feature compatibility? > > > > interface_version={val3:int:2,3} What does this turn into in a few years, 2,7,12,23,75,96,... > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > aggregator={val1}/2 > > pv_mode="" #"" meaning empty, could be absent in a compatible device > > interface_version=1 Why can't this be represented within the previous compatible description? > if you presented this information the only way i could see to use it would be to > extract the mdev_type name and interface_vertion and build a database table as follows > > source_mdev_type | source_version | target_mdev_type | target_version > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 > > this would either reuiqre use to use a post placment sechudler filter to itrospec this data base > or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to > our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable > alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each > device * the number of possible compatible devices for that device. > > in other word if this is just opaque data we cant ever represent it efficently in our placment service and > have to fall back to an explisive post placment schdluer filter base on the db table approch. > > this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable > devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host > with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU > resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the CUSTOM_ > trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that > workflow. The latter would need to be parsed into: i915-GVTg_V5_1 i915-GVTg_V5_2 i915-GVTg_V5_4 i915-GVTg_V5_8 There is also on the table, migration from physical devices to mdev devices (or vice versa), which is not represented in these examples, nor do I see how we'd represent it. This is where I started exposing the resulting PCI device from the mdev in my example so we could have some commonality between devices, but the migration stream provider is just as important as the type of device, we could have different host drivers providing the same device with incompatible migration streams. The mdev_type encompasses both the driver and device, but we wouldn't have mdev_types for physical devices, per our current thinking. > > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > > SELF: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_4 > > aggregator=2 > > interface_version=1 > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > aggregator={val1}/2 > > interface_version=1 > by the way this is closer to yaml format then it is to json but it does not align with any exsiting > format i know of so that just make the representation needless hard to consume > if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > > > Notes: > > - A COMPATIBLE object is a line starting with COMPATIBLE. > > It specifies a list of compatible devices that are allowed to migrate > > in. > > The reason to allow multiple COMPATIBLE objects is that when it > > is hard to express a complex compatible logic in one COMPATIBLE > > object, a simple enumeration is still a fallback. > > in the above example, device UUID2 is in the compatible list of > > device UUID1, but device UUID1 is not in the compatible list of device > > UUID2, so device UUID2 is able to migrate to device UUID1, but device > > UUID1 is not able to migrate to device UUID2. > > > > - fields under each object are of "and" relationship to each other, meaning > > all fields of SELF object of a target device must be equal to corresponding > > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > > compatible. > > > > - each field, however, is able to specify multiple allowed values, using > > variables as explained below. > > > > - variables are represented with {}, the first appearance of one variable > > specifies its type and allowed list. e.g. > > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > > values are 1, 2, 4, 8. > > > > - vendors are able to specify which fields are within the comparing list > > and which fields are not. e.g. for physical VF migration, it may not > > choose mdev_type as a comparing field, and maybe use driver name instead. > this format might be useful to vendors but from a orcestrator > perspecive i dont think this has value to us likely we would not use > this api if it was added as it does not help us with schduling. > ideally instead fo declaring which other mdev types a device is > compatiable with (which could presumably change over time as new > device and firmwares are released) i would prefer to see a > declaritive non vendor specific api that declares the feature set > provided by each mdev_type from which we can infer comaptiablity > similar to cpu feature flags. for devices fo the same mdev_type name > addtionally a declaritive version sting could also be used if > required for addtional compatiablity checks. "non vendor specific api that declares the feature set", aren't features generally vendor specific? What we're trying to describe is, by it's very nature, vendor specific. We don't have an ISO body defining a graphics adapter and enumerating features for that adapter. I think what we have is mdev_types. Each type is supposed to define a specific software interface, perhaps even more so than is done by a PCI vendor:device ID. Maybe that mdev_type needs to be abstracted as something more like a vendor signature, such that a physical device could provide or accept a vendor signature that's compatible with an mdev device. For example, a physically assigned Intel GPU might expose a migration signature of i915-GVTg_v5_8 if it were designed to be compatible with that mdev_type. Thanks, Alex From yan.y.zhao at intel.com Thu Jul 30 01:56:39 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 30 Jul 2020 09:56:39 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> Message-ID: <20200730015639.GA32327@joy-OptiPlex-7040> On Wed, Jul 29, 2020 at 12:28:46PM +0100, Sean Mooney wrote: > On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > On Mon, 27 Jul 2020 15:24:40 +0800 > > > Yan Zhao wrote: > > > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > > information embedded within the migration stream. Therefore a > > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > > get vendor specific compatibility checking string in migration setup stage > > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > > In this way, for devices who does not save device data in precopy stage, > > > > > the migration compatibility checking is as late as in stop-and-copy > > > > > stage, which is too late. > > > > > do you think we need to add the getting/checking of vendor specific > > > > > compatibility string early in save_setup stage? > > > > > > > > > > > > > hi Alex, > > > > after an offline discussion with Kevin, I realized that it may not be a > > > > problem if migration compatibility check in vendor driver occurs late in > > > > stop-and-copy phase for some devices, because if we report device > > > > compatibility attributes clearly in an interface, the chances for > > > > libvirt/openstack to make a wrong decision is little. > > > > > > I think it would be wise for a vendor driver to implement a pre-copy > > > phase, even if only to send version information and verify it at the > > > target. Deciding you have no device state to send during pre-copy does > > > not mean your vendor driver needs to opt-out of the pre-copy phase > > > entirely. Please also note that pre-copy is at the user's discretion, > > > we've defined that we can enter stop-and-copy at any point, including > > > without a pre-copy phase, so I would recommend that vendor drivers > > > validate compatibility at the start of both the pre-copy and the > > > stop-and-copy phases. > > > > > > > ok. got it! > > > > > > so, do you think we are now arriving at an agreement that we'll give up > > > > the read-and-test scheme and start to defining one interface (perhaps in > > > > json format), from which libvirt/openstack is able to parse and find out > > > > compatibility list of a source mdev/physical device? > > > > > > Based on the feedback we've received, the previously proposed interface > > > is not viable. I think there's agreement that the user needs to be > > > able to parse and interpret the version information. Using json seems > > > viable, but I don't know if it's the best option. Is there any > > > precedent of markup strings returned via sysfs we could follow? > > > > I found some examples of using formatted string under /sys, mostly under > > tracing. maybe we can do a similar implementation. > > > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > > > name: kvm_mmio > > ID: 32 > > format: > > field:unsigned short common_type; offset:0; size:2; signed:0; > > field:unsigned char common_flags; offset:2; size:1; signed:0; > > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > > field:int common_pid; offset:4; size:4; signed:1; > > > > field:u32 type; offset:8; size:4; signed:0; > > field:u32 len; offset:12; size:4; signed:0; > > field:u64 gpa; offset:16; size:8; signed:0; > > field:u64 val; offset:24; size:8; signed:0; > > > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" > > }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > this is not json fromat and its not supper frendly to parse. yes, it's just an example. It's exported to be used by userspace perf & trace_cmd. > > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > > DRIVER=vfio-pci > > PCI_CLASS=30000 > > PCI_ID=8086:591D > > PCI_SUBSYS_ID=8086:2212 > > PCI_SLOT_NAME=0000:00:02.0 > > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > > this is ini format or conf formant > this is pretty simple to parse whichi would be fine. > that said you could also have a version or capablitiy directory with a file > for each key and a singel value. > if this is easy for openstack, maybe we can organize the data like below way? |- [device] |- migration |-self |-compatible1 |-compatible2 e.g. #cat /sys/bus/pci/devices/0000:00:02.0/UUID1/migration/self filed1=xxx filed2=xxx filed3=xxx filed3=xxx #cat /sys/bus/pci/devices/0000:00:02.0/UUID1/migration/compatible filed1=xxx filed2=xxx filed3=xxx filed3=xxx or in a flat layer |- [device] |- migration-self-traits |- migration-compatible-traits I'm not sure whether json format in a single file is better, as I didn't find any precedent. > i would prefer to only have to do one read personally the list the files in > directory and then read tehm all ot build the datastucture myself but that is > doable though the simple ini format use d for uevent seams the best of 3 options > provided above. > > > > > > Your idea of having both a "self" object and an array of "compatible" > > > objects is perhaps something we can build on, but we must not assume > > > PCI devices at the root level of the object. Providing both the > > > mdev-type and the driver is a bit redundant, since the former includes > > > the latter. We can't have vendor specific versioning schemes though, > > > ie. gvt-version. We need to agree on a common scheme and decide which > > > fields the version is relative to, ex. just the mdev type? > > > > what about making all comparing fields vendor specific? > > userspace like openstack only needs to parse and compare if target > > device is within source compatible list without understanding the meaning > > of each field. > that kind of defeats the reason for having them be be parsable. > the reason openstack want to be able to understand the capablitys is so > we can staticaly declare the capablit of devices ahead of time on so our schduler > can select host based on that. is the keys and data are opaquce to userspace > becaue they are just random vendor sepecific blobs we cant do that. I heard that cyborg can parse the kernel interface and generate several traits without understanding the meaning of each trait. Then it reports those traits to placement for scheduling. but I agree if mdev creation is involved, those traits need to match to mdev attributes and mdev_type. could you explain a little how you plan to create a target mdev device? is it dynamically created during searching of compatible mdevs or just statically created before migration? > > > > > I had also proposed fields that provide information to create a > > > compatible type, for example to create a type_x2 device from a type_x1 > > > mdev type, they need to know to apply an aggregation attribute. If we > > > need to explicitly list every aggregation value and the resulting type, > > > I think we run aground of what aggregation was trying to avoid anyway, > > > so we might need to pick a language that defines variable substitution > > > or some kind of tagging. For example if we could define ${aggr} as an > > > integer within a specified range, then we might be able to define a type > > > relative to that value (type_x${aggr}) which requires an aggregation > > > attribute using the same value. I dunno, just spit balling. Thanks, > > > > what about a migration_compatible attribute under device node like > > below? > rather then listing comaptiable devices it would be better if you could declaritivly > list the feature supported and we could compare those along with a simple semver version string. I think below is already in a way of listing feature supported. The reason I also want to declare compatible lists of features is that sometimes it's not a simple 1:1 matching of source list and target list. as I demonstrated below, source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), (mdev_type i915-GVTg_V5_8 + aggregator 4) and aggragator may be just one of such examples that 1:1 matching is not fit. so I guess it's best not to leave the hard decision to openstack. Thanks Yan > > > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > > SELF: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_2 > > aggregator=1 > > pv_mode="none+ppgtt+context" > > interface_version=3 > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > this mixed notation will be hard to parse so i would avoid that. > > aggregator={val1}/2 > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > interface_version={val3:int:2,3} > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > aggregator={val1}/2 > > pv_mode="" #"" meaning empty, could be absent in a compatible device > > interface_version=1 > if you presented this information the only way i could see to use it would be to > extract the mdev_type name and interface_vertion and build a database table as follows > > source_mdev_type | source_version | target_mdev_type | target_version > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 > > this would either reuiqre use to use a post placment sechudler filter to itrospec this data base > or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to > our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable > alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each > device * the number of possible compatible devices for that device. > > in other word if this is just opaque data we cant ever represent it efficently in our placment service and > have to fall back to an explisive post placment schdluer filter base on the db table approch. > > this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable > devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host > with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU > resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the CUSTOM_ > trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that > workflow. > > > > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > > SELF: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_4 > > aggregator=2 > > interface_version=1 > > COMPATIBLE: > > device_type=pci > > device_id=8086591d > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > aggregator={val1}/2 > > interface_version=1 > by the way this is closer to yaml format then it is to json but it does not align with any exsiting > format i know of so that just make the representation needless hard to consume > if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > > > Notes: > > - A COMPATIBLE object is a line starting with COMPATIBLE. > > It specifies a list of compatible devices that are allowed to migrate > > in. > > The reason to allow multiple COMPATIBLE objects is that when it > > is hard to express a complex compatible logic in one COMPATIBLE > > object, a simple enumeration is still a fallback. > > in the above example, device UUID2 is in the compatible list of > > device UUID1, but device UUID1 is not in the compatible list of device > > UUID2, so device UUID2 is able to migrate to device UUID1, but device > > UUID1 is not able to migrate to device UUID2. > > > > - fields under each object are of "and" relationship to each other, meaning > > all fields of SELF object of a target device must be equal to corresponding > > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > > compatible. > > > > - each field, however, is able to specify multiple allowed values, using > > variables as explained below. > > > > - variables are represented with {}, the first appearance of one variable > > specifies its type and allowed list. e.g. > > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > > values are 1, 2, 4, 8. > > > > - vendors are able to specify which fields are within the comparing list > > and which fields are not. e.g. for physical VF migration, it may not > > choose mdev_type as a comparing field, and maybe use driver name instead. > this format might be useful to vendors but from a orcestrator perspecive i dont think this has > value to us likely we would not use this api if it was added as it does not help us with schduling. > ideally instead fo declaring which other mdev types a device is compatiable with (which could presumably change over > time as new device and firmwares are released) i would prefer to see a declaritive non vendor specific api that declares > the feature set provided by each mdev_type from which we can infer comaptiablity similar to cpu feature flags. > for devices fo the same mdev_type name addtionally a declaritive version sting could also be used if required for > addtional compatiablity checks. > > > > > > Thanks > > Yan > > > > > From yan.y.zhao at intel.com Thu Jul 30 03:41:04 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 30 Jul 2020 11:41:04 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200729131255.68730f68@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200729131255.68730f68@x1.home> Message-ID: <20200730034104.GB32327@joy-OptiPlex-7040> On Wed, Jul 29, 2020 at 01:12:55PM -0600, Alex Williamson wrote: > On Wed, 29 Jul 2020 12:28:46 +0100 > Sean Mooney wrote: > > > On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > > On Mon, 27 Jul 2020 15:24:40 +0800 > > > > Yan Zhao wrote: > > > > > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > > > information embedded within the migration stream. Therefore a > > > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > > > get vendor specific compatibility checking string in migration setup stage > > > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > > > In this way, for devices who does not save device data in precopy stage, > > > > > > the migration compatibility checking is as late as in stop-and-copy > > > > > > stage, which is too late. > > > > > > do you think we need to add the getting/checking of vendor specific > > > > > > compatibility string early in save_setup stage? > > > > > > > > > > > > > > > > hi Alex, > > > > > after an offline discussion with Kevin, I realized that it may not be a > > > > > problem if migration compatibility check in vendor driver occurs late in > > > > > stop-and-copy phase for some devices, because if we report device > > > > > compatibility attributes clearly in an interface, the chances for > > > > > libvirt/openstack to make a wrong decision is little. > > > > > > > > I think it would be wise for a vendor driver to implement a pre-copy > > > > phase, even if only to send version information and verify it at the > > > > target. Deciding you have no device state to send during pre-copy does > > > > not mean your vendor driver needs to opt-out of the pre-copy phase > > > > entirely. Please also note that pre-copy is at the user's discretion, > > > > we've defined that we can enter stop-and-copy at any point, including > > > > without a pre-copy phase, so I would recommend that vendor drivers > > > > validate compatibility at the start of both the pre-copy and the > > > > stop-and-copy phases. > > > > > > > > > > ok. got it! > > > > > > > > so, do you think we are now arriving at an agreement that we'll give up > > > > > the read-and-test scheme and start to defining one interface (perhaps in > > > > > json format), from which libvirt/openstack is able to parse and find out > > > > > compatibility list of a source mdev/physical device? > > > > > > > > Based on the feedback we've received, the previously proposed interface > > > > is not viable. I think there's agreement that the user needs to be > > > > able to parse and interpret the version information. Using json seems > > > > viable, but I don't know if it's the best option. Is there any > > > > precedent of markup strings returned via sysfs we could follow? > > > > > > I found some examples of using formatted string under /sys, mostly under > > > tracing. maybe we can do a similar implementation. > > > > > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > > > > > name: kvm_mmio > > > ID: 32 > > > format: > > > field:unsigned short common_type; offset:0; size:2; signed:0; > > > field:unsigned char common_flags; offset:2; size:1; signed:0; > > > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > > > field:int common_pid; offset:4; size:4; signed:1; > > > > > > field:u32 type; offset:8; size:4; signed:0; > > > field:u32 len; offset:12; size:4; signed:0; > > > field:u64 gpa; offset:16; size:8; signed:0; > > > field:u64 val; offset:24; size:8; signed:0; > > > > > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" > > > }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > > > this is not json fromat and its not supper frendly to parse. > > > > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > > > DRIVER=vfio-pci > > > PCI_CLASS=30000 > > > PCI_ID=8086:591D > > > PCI_SUBSYS_ID=8086:2212 > > > PCI_SLOT_NAME=0000:00:02.0 > > > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > > > > this is ini format or conf formant > > this is pretty simple to parse whichi would be fine. > > that said you could also have a version or capablitiy directory with a file > > for each key and a singel value. > > > > i would prefer to only have to do one read personally the list the files in > > directory and then read tehm all ot build the datastucture myself but that is > > doable though the simple ini format use d for uevent seams the best of 3 options > > provided above. > > > > > > > > Your idea of having both a "self" object and an array of "compatible" > > > > objects is perhaps something we can build on, but we must not assume > > > > PCI devices at the root level of the object. Providing both the > > > > mdev-type and the driver is a bit redundant, since the former includes > > > > the latter. We can't have vendor specific versioning schemes though, > > > > ie. gvt-version. We need to agree on a common scheme and decide which > > > > fields the version is relative to, ex. just the mdev type? > > > > > > what about making all comparing fields vendor specific? > > > userspace like openstack only needs to parse and compare if target > > > device is within source compatible list without understanding the meaning > > > of each field. > > that kind of defeats the reason for having them be be parsable. > > the reason openstack want to be able to understand the capablitys is so > > we can staticaly declare the capablit of devices ahead of time on so our schduler > > can select host based on that. is the keys and data are opaquce to userspace > > becaue they are just random vendor sepecific blobs we cant do that. > > Agreed, I'm not sure I'm willing to rule out that there could be vendor > specific direct match fields, as I included in my example earlier in > the thread, but entirely vendor specific defeats much of the purpose > here. > > > > > I had also proposed fields that provide information to create a > > > > compatible type, for example to create a type_x2 device from a type_x1 > > > > mdev type, they need to know to apply an aggregation attribute. If we > > > > need to explicitly list every aggregation value and the resulting type, > > > > I think we run aground of what aggregation was trying to avoid anyway, > > > > so we might need to pick a language that defines variable substitution > > > > or some kind of tagging. For example if we could define ${aggr} as an > > > > integer within a specified range, then we might be able to define a type > > > > relative to that value (type_x${aggr}) which requires an aggregation > > > > attribute using the same value. I dunno, just spit balling. Thanks, > > > > > > what about a migration_compatible attribute under device node like > > > below? > > rather then listing comaptiable devices it would be better if you could declaritivly > > list the feature supported and we could compare those along with a simple semver version string. > > > > > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > > Note that we're defining compatibility relative to a vfio migration > interface, so we should include that in the name, we don't know what > other migration interfaces might exist. do you mean we need to name it as vfio_migration, e.g. /sys/bus/pci/devices/0000\:00\:02.0/UUID1/vfio_migration ? > > > > SELF: > > > device_type=pci > > Why not the device_api here, ie. vfio-pci. The device doesn't provide > a pci interface directly, it's wrapped in a vfio API. > the device_type is to indicate below device_id is a pci id. yes, include a device_api field is better. for mdev, "device_type=vfio-mdev", is it right? > > > device_id=8086591d > > Is device_id interpreted relative to device_type? How does this > relate to mdev_type? If we have an mdev_type, doesn't that fully > defined the software API? > it's parent pci id for mdev actually. > > > mdev_type=i915-GVTg_V5_2 > > And how are non-mdev devices represented? > non-mdev can opt to not include this field, or as you said below, a vendor signature. > > > aggregator=1 > > > pv_mode="none+ppgtt+context" > > These are meaningless vendor specific matches afaict. > yes, pv_mode and aggregator are vendor specific fields. but they are important to decide whether two devices are compatible. pv_mode means whether a vGPU supports guest paravirtualized api. "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or use context mode pv. > > > interface_version=3 > > Not much granularity here, I prefer Sean's previous > .[.bugfix] scheme. > yes, .[.bugfix] scheme may be better, but I'm not sure if it works for a complicated scenario. e.g for pv_mode, (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. (3) later, pv_mode=context is also supported, pv_mode="none+ppgtt+context", so it's 0.2.0. But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to name its version? "none+ppgtt" (0.1.0) is not compatible to "none+context", but "none+ppgtt+context" (0.2.0) is compatible to "none+context". Maintain such scheme is painful to vendor driver. > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > this mixed notation will be hard to parse so i would avoid that. > > Some background, Intel has been proposing aggregation as a solution to > how we scale mdev devices when hardware exposes large numbers of > assignable objects that can be composed in essentially arbitrary ways. > So for instance, if we have a workqueue (wq), we might have an mdev > type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a > discrete mdev type for each of those, so they want to define a base > type which is composable to other types via this aggregation. This is > what this substitution and tagging is attempting to accomplish. So > imagine this set of values for cases where it's not practical to unroll > the values for N discrete types. > > > > aggregator={val1}/2 > > So the {val1} above would be substituted here, though an aggregation > factor of 1/2 is a head scratcher... > > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > I'm lost on this one though. I think maybe it's indicating that it's > compatible with any of these, so do we need to list it? Couldn't this > be handled by Sean's version proposal where the minor version > represents feature compatibility? yes, it's indicating that it's compatible with any of these. Sean's version proposal may also work, but it would be painful for vendor driver to maintain the versions when multiple similar features are involved. > > > > > > > interface_version={val3:int:2,3} > > What does this turn into in a few years, 2,7,12,23,75,96,... > is a range better? > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > aggregator={val1}/2 > > > pv_mode="" #"" meaning empty, could be absent in a compatible device > > > interface_version=1 > > Why can't this be represented within the previous compatible > description? > actually it can be merged with the previous one :) But I guess there must be one that cannot merge, so put it as an example to demo multiple COMPATIBLE objects. Thanks Yan > > if you presented this information the only way i could see to use it would be to > > extract the mdev_type name and interface_vertion and build a database table as follows > > > > source_mdev_type | source_version | target_mdev_type | target_version > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 > > > > this would either reuiqre use to use a post placment sechudler filter to itrospec this data base > > or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to > > our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable > > alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each > > device * the number of possible compatible devices for that device. > > > > in other word if this is just opaque data we cant ever represent it efficently in our placment service and > > have to fall back to an explisive post placment schdluer filter base on the db table approch. > > > > this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable > > devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host > > with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU > > resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the CUSTOM_ > > trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that > > workflow. > > The latter would need to be parsed into: > > i915-GVTg_V5_1 > i915-GVTg_V5_2 > i915-GVTg_V5_4 > i915-GVTg_V5_8 > > There is also on the table, migration from physical devices to mdev > devices (or vice versa), which is not represented in these examples, > nor do I see how we'd represent it. This is where I started exposing > the resulting PCI device from the mdev in my example so we could have > some commonality between devices, but the migration stream provider is > just as important as the type of device, we could have different host > drivers providing the same device with incompatible migration streams. > The mdev_type encompasses both the driver and device, but we wouldn't > have mdev_types for physical devices, per our current thinking. > > > > > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > > > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > > > SELF: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_4 > > > aggregator=2 > > > interface_version=1 > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > aggregator={val1}/2 > > > interface_version=1 > > by the way this is closer to yaml format then it is to json but it does not align with any exsiting > > format i know of so that just make the representation needless hard to consume > > if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > > > > > Notes: > > > - A COMPATIBLE object is a line starting with COMPATIBLE. > > > It specifies a list of compatible devices that are allowed to migrate > > > in. > > > The reason to allow multiple COMPATIBLE objects is that when it > > > is hard to express a complex compatible logic in one COMPATIBLE > > > object, a simple enumeration is still a fallback. > > > in the above example, device UUID2 is in the compatible list of > > > device UUID1, but device UUID1 is not in the compatible list of device > > > UUID2, so device UUID2 is able to migrate to device UUID1, but device > > > UUID1 is not able to migrate to device UUID2. > > > > > > - fields under each object are of "and" relationship to each other, meaning > > > all fields of SELF object of a target device must be equal to corresponding > > > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > > > compatible. > > > > > > - each field, however, is able to specify multiple allowed values, using > > > variables as explained below. > > > > > > - variables are represented with {}, the first appearance of one variable > > > specifies its type and allowed list. e.g. > > > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > > > values are 1, 2, 4, 8. > > > > > > - vendors are able to specify which fields are within the comparing list > > > and which fields are not. e.g. for physical VF migration, it may not > > > choose mdev_type as a comparing field, and maybe use driver name instead. > > this format might be useful to vendors but from a orcestrator > > perspecive i dont think this has value to us likely we would not use > > this api if it was added as it does not help us with schduling. > > ideally instead fo declaring which other mdev types a device is > > compatiable with (which could presumably change over time as new > > device and firmwares are released) i would prefer to see a > > declaritive non vendor specific api that declares the feature set > > provided by each mdev_type from which we can infer comaptiablity > > similar to cpu feature flags. for devices fo the same mdev_type name > > addtionally a declaritive version sting could also be used if > > required for addtional compatiablity checks. > > "non vendor specific api that declares the feature set", aren't > features generally vendor specific? What we're trying to describe is, > by it's very nature, vendor specific. We don't have an ISO body > defining a graphics adapter and enumerating features for that adapter. > I think what we have is mdev_types. Each type is supposed to define a > specific software interface, perhaps even more so than is done by a PCI > vendor:device ID. Maybe that mdev_type needs to be abstracted as > something more like a vendor signature, such that a physical device > could provide or accept a vendor signature that's compatible with an > mdev device. For example, a physically assigned Intel GPU might expose > a migration signature of i915-GVTg_v5_8 if it were designed to be > compatible with that mdev_type. Thanks, > > Alex > From liuzhenjie at bonc.com.cn Thu Jul 30 03:57:34 2020 From: liuzhenjie at bonc.com.cn (liuzhenjie at bonc.com.cn) Date: Thu, 30 Jul 2020 11:57:34 +0800 Subject: networking-l2gw Message-ID: <202007301157342887612@bonc.com.cn> 您好: 我现在想要使用l2gw这个服务。但是,我得环境是通过kolla-ansible部署的,这个服务在neutron-server的容器里起来后,在执行创建资源时报错如下: 我得ovsdb server是一台安装了ovs的x86服务器。l2gw服务起来后发现neutron-server和该x86服务器没有建立ovsdb的连接。 ********* 积极主动、认真工作、快乐生活 ********** ***************************************** * 北京东方国信科技股份有限公司 数据科学事业部-公有云研发中心 * 姓名:刘镇杰 * 座机:0108486-6996 * 手机:15689961523 * 邮箱:liuzhenjie at bonc.com.cn * 网址:http://www.bonc.com.cn * 地址:北京市朝阳区创达三路1号院1号楼东方国信大厦 ***************************************** 系统集成部提供可靠的IT基础设施建设、运维支撑保障服务,确保业务系统稳定、高效运行,降低安全生产风险。 为提供便捷的集成服务支持、提升服务品质,从2018年9月1日起开通系统集成服务台作为统一服务入口。 服务台邮箱:xtjc at bonc.com.cn 服务台门户:http://cloud.bonc.local -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Catch.jpg Type: image/jpeg Size: 95721 bytes Desc: not available URL: From smooney at redhat.com Thu Jul 30 13:14:35 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 30 Jul 2020 14:14:35 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200730015639.GA32327@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200730015639.GA32327@joy-OptiPlex-7040> Message-ID: On Thu, 2020-07-30 at 09:56 +0800, Yan Zhao wrote: > On Wed, Jul 29, 2020 at 12:28:46PM +0100, Sean Mooney wrote: > > On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > > On Mon, 27 Jul 2020 15:24:40 +0800 > > > > Yan Zhao wrote: > > > > > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > > > information embedded within the migration stream. Therefore a > > > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > > > get vendor specific compatibility checking string in migration setup stage > > > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > > > In this way, for devices who does not save device data in precopy stage, > > > > > > the migration compatibility checking is as late as in stop-and-copy > > > > > > stage, which is too late. > > > > > > do you think we need to add the getting/checking of vendor specific > > > > > > compatibility string early in save_setup stage? > > > > > > > > > > > > > > > > hi Alex, > > > > > after an offline discussion with Kevin, I realized that it may not be a > > > > > problem if migration compatibility check in vendor driver occurs late in > > > > > stop-and-copy phase for some devices, because if we report device > > > > > compatibility attributes clearly in an interface, the chances for > > > > > libvirt/openstack to make a wrong decision is little. > > > > > > > > I think it would be wise for a vendor driver to implement a pre-copy > > > > phase, even if only to send version information and verify it at the > > > > target. Deciding you have no device state to send during pre-copy does > > > > not mean your vendor driver needs to opt-out of the pre-copy phase > > > > entirely. Please also note that pre-copy is at the user's discretion, > > > > we've defined that we can enter stop-and-copy at any point, including > > > > without a pre-copy phase, so I would recommend that vendor drivers > > > > validate compatibility at the start of both the pre-copy and the > > > > stop-and-copy phases. > > > > > > > > > > ok. got it! > > > > > > > > so, do you think we are now arriving at an agreement that we'll give up > > > > > the read-and-test scheme and start to defining one interface (perhaps in > > > > > json format), from which libvirt/openstack is able to parse and find out > > > > > compatibility list of a source mdev/physical device? > > > > > > > > Based on the feedback we've received, the previously proposed interface > > > > is not viable. I think there's agreement that the user needs to be > > > > able to parse and interpret the version information. Using json seems > > > > viable, but I don't know if it's the best option. Is there any > > > > precedent of markup strings returned via sysfs we could follow? > > > > > > I found some examples of using formatted string under /sys, mostly under > > > tracing. maybe we can do a similar implementation. > > > > > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > > > > > name: kvm_mmio > > > ID: 32 > > > format: > > > field:unsigned short common_type; offset:0; size:2; signed:0; > > > field:unsigned char common_flags; offset:2; size:1; signed:0; > > > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > > > field:int common_pid; offset:4; size:4; signed:1; > > > > > > field:u32 type; offset:8; size:4; signed:0; > > > field:u32 len; offset:12; size:4; signed:0; > > > field:u64 gpa; offset:16; size:8; signed:0; > > > field:u64 val; offset:24; size:8; signed:0; > > > > > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, > > > "read" > > > }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > > > > > this is not json fromat and its not supper frendly to parse. > > yes, it's just an example. It's exported to be used by userspace perf & > trace_cmd. > > > > > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > > > DRIVER=vfio-pci > > > PCI_CLASS=30000 > > > PCI_ID=8086:591D > > > PCI_SUBSYS_ID=8086:2212 > > > PCI_SLOT_NAME=0000:00:02.0 > > > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > > > > > > this is ini format or conf formant > > this is pretty simple to parse whichi would be fine. > > that said you could also have a version or capablitiy directory with a file > > for each key and a singel value. > > > > if this is easy for openstack, maybe we can organize the data like below way? > > |- [device] > |- migration > |-self > |-compatible1 > |-compatible2 > > e.g. > #cat /sys/bus/pci/devices/0000:00:02.0/UUID1/migration/self > filed1=xxx > filed2=xxx > filed3=xxx > filed3=xxx > #cat /sys/bus/pci/devices/0000:00:02.0/UUID1/migration/compatible > filed1=xxx > filed2=xxx > filed3=xxx > filed3=xxx ya this would work. nova specificly the libvirt driver trys to avoid reading sysfs directly if libvirt has an api that provides the infomation but where it does not it can read it and that structure woudl be easy for use to consume. libs like os-vif which cant depend on libvirt use it a little more for example to look up a PF form one of its VFs https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/linux_net.py#L384-L391 we are carefult not to over use sysfs as it can change over time based on kernel version in somecase but its is genernal seen a preferable to calling an every growing list of comnnadline clients to retrive the same info. > > or in a flat layer > |- [device] > |- migration-self-traits > |- migration-compatible-traits > > I'm not sure whether json format in a single file is better, as I didn't > find any precedent. i think i prefer the nested directories to this flatend styple but there isnent really any significant increase in complexity form a bash scripting point of view if i was manually debuging something the multi layer reprentation is slight simpler to work with but not enough so that it really matters. > > > i would prefer to only have to do one read personally the list the files in > > directory and then read tehm all ot build the datastucture myself but that is > > doable though the simple ini format use d for uevent seams the best of 3 options > > provided above. > > > > > > > > Your idea of having both a "self" object and an array of "compatible" > > > > objects is perhaps something we can build on, but we must not assume > > > > PCI devices at the root level of the object. Providing both the > > > > mdev-type and the driver is a bit redundant, since the former includes > > > > the latter. We can't have vendor specific versioning schemes though, > > > > ie. gvt-version. We need to agree on a common scheme and decide which > > > > fields the version is relative to, ex. just the mdev type? > > > > > > what about making all comparing fields vendor specific? > > > userspace like openstack only needs to parse and compare if target > > > device is within source compatible list without understanding the meaning > > > of each field. > > > > that kind of defeats the reason for having them be be parsable. > > the reason openstack want to be able to understand the capablitys is so > > we can staticaly declare the capablit of devices ahead of time on so our schduler > > can select host based on that. is the keys and data are opaquce to userspace > > becaue they are just random vendor sepecific blobs we cant do that. > > I heard that cyborg can parse the kernel interface and generate several > traits without understanding the meaning of each trait. Then it reports > those traits to placement for scheduling. if it is doing a raw passthough like that 1 it will break users if a vendor every removes a trait or renames it as part of a firwmware update and second it will require them to use CUSTOM_ triant in stead of standardised traits. in other words is an interoperatbltiy problem between clouds. at present cyborg does not support mdevs there is a proposal for adding a generic mdev driver for generic stateless devices. it could report arbitary capablity to placment although its does not exsit yet so its kind of premature ot point to it as an example > > but I agree if mdev creation is involved, those traits need to match > to mdev attributes and mdev_type. currently the only use of mdevs in openstack is for vGPU with nvidia devices. in theory intel gpus can work with the existing code but it has not been tested. > > could you explain a little how you plan to create a target mdev device? > is it dynamically created during searching of compatible mdevs or just statically > created before migration? the mdevs are currently created dynamically when a vm is created based on a set of pre defiend flavor which have static metadata in the form of flavor extra_specs. thost extra specs request a vgpu by spcifying resouces:VGPU=1 in the extra specs. e.g. openstack flavor set vgpu_1 --property "resources:VGPU=1" if you want a specific vgpu type then you must request a custom trait in addtion to the resouce class openstack --os-placement-api-version 1.6 trait create CUSTOM_NVIDIA_11 openstack flavor set --property trait:CUSTOM_NVIDIA_11=required vgpu_1 when configuring the host for vGPUs you list the enabled vgpu mdev types and the device that can use them [devices] enabled_vgpu_types = nvidia-35, nvidia-36 [vgpu_nvidia-35] device_addresses = 0000:84:00.0,0000:85:00.0 [vgpu_nvidia-36] device_addresses = 0000:86:00.0 each device that is listed will be created as a resouce provider in the plamcent service so to associate the custom trait with the physical gpu and mdev type you manually tag the RP withthe trait openstack --os-placement-api-version 1.6 resource provider trait set \ --trait CUSTOM_NVIDIA_11 e2f8607b-0683-4141-a8af-f5e20682e28c this decouple the name of the CUSTOM_ trait form the underliying mdev type so the operator is free to use small|medium|large or bronze|silver|gold if they want to or they coudld chose to use the mdev_type name if they want too. currently we dont support live migration with vGPU because the required code has not been upstream to qemu/kvm yet? i belive it just missed the kernel 5.7 merge window? i know its in flight but have not been following too closely if you do a cold/offline migration currenlty and you had multiple mdev types then technical the mdev type could change. we had planned for operators to ensure that what ever trait they choose would map to the same mdev type on all hosts. if we were to supprot live migration in the future without this new api we are disccusing we woudl make the trait to mdev type relationship required to be 1:1 for live migration. we have talked auto creating traits for gvpus which would be in the form of CUSTOM_ but shyed away from it as we are worried vendors will break us and our user by changing mdev_types in frimware updates or driver updates. we kind of need to rely on them being stable but we are hesitent to encode them in our public api in this manner. > > > > > > > I had also proposed fields that provide information to create a > > > > compatible type, for example to create a type_x2 device from a type_x1 > > > > mdev type, they need to know to apply an aggregation attribute. honestly form an opesntack point of view i woudl prefer if each consumable resouce was exposed as a different mdev_type and we could just create multiple mdevs and attach them to a vm. that would allow use to do the aggreatation our selves. parsing mdev atributes and dynamicaly creating 1 mdev type from aggregation of other requires detailed knoladge of the vendor device. the cyborg(acclerator managment) project might be open to this becuase they have plugable vendor specific and could write a driver that only work with a sepecifc sku of a vendoer deivce or a device familay e.g. a qat driver that could have the require knoladge to do the compostion. that type of lowlevel device management is out of scope of the nova (compute) project we woudl be far more likely to require operator to staticly parttion the device up front into mdevs and pass us a list of them which we could then provend to vms. we more or less already do this for vGPU today as the phsycal gpus need to be declared to support exactly 1 mdev_type each and the same is true for persistent memroy. you need to pre create the persistent memeroy namespaces and then provide the list of namespaces to nova. so aggregation is something i suspect taht will only be supported in cyborg if it eventually supprot mdevs. it has not been requested or assesed for nova yet but it seams unlikely. in a migration work flow i would expect the nova conduction or source host to make an rpc call to the destination host in pre live migration to create the mdev. this is before the call to libvirt to migrate the vm and before it would do any validation but after schduleing. so ideally we shoudl know at this point that the destination host has a comaptiable device. > > > > If we > > > > need to explicitly list every aggregation value and the resulting type, > > > > I think we run aground of what aggregation was trying to avoid anyway, > > > > so we might need to pick a language that defines variable substitution > > > > or some kind of tagging. For example if we could define ${aggr} as an > > > > integer within a specified range, then we might be able to define a type > > > > relative to that value (type_x${aggr}) which requires an aggregation > > > > attribute using the same value. I dunno, just spit balling. Thanks, > > > > > > what about a migration_compatible attribute under device node like > > > below? > > > > rather then listing comaptiable devices it would be better if you could declaritivly > > list the feature supported and we could compare those along with a simple semver version string. > > I think below is already in a way of listing feature supported. > The reason I also want to declare compatible lists of features is that > sometimes it's not a simple 1:1 matching of source list and target list. > as I demonstrated below, > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2), > (mdev_type i915-GVTg_V5_8 + aggregator 4) > > and aggragator may be just one of such examples that 1:1 matching is not > fit. so far i am not conviced that aggragators are a good concept to model at this level. is there some document that explains why they are need and we cant jsut have multipel mdev_type per consumable resouce and attach multiple mdevs to a singel vm. i suspect this is due to limitation in compoasblity in hardware such as nvidia multi instance gpu tech. however (mdev_type i915-GVTg_V5_8 + aggregator 4) seams unfriendly to work with form an orchestrato perspective. on of our current complaint with the mdev api today is that depending on the device consoming and instance of 1 mdev type may impact the availablity of other or change the avaiablity capastiyt of others. that make it very hard to reason about capastiy avaiablity and aggregator sound like it will make that problem worse not better. > so I guess it's best not to leave the hard decision to openstack. > > Thanks > Yan > > > > > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > > > SELF: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_2 > > > aggregator=1 > > > pv_mode="none+ppgtt+context" > > > interface_version=3 > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > this mixed notation will be hard to parse so i would avoid that. > > > aggregator={val1}/2 > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > > > interface_version={val3:int:2,3} > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > aggregator={val1}/2 > > > pv_mode="" #"" meaning empty, could be absent in a compatible device > > > interface_version=1 > > > > if you presented this information the only way i could see to use it would be to > > extract the mdev_type name and interface_vertion and build a database table as follows > > > > source_mdev_type | source_version | target_mdev_type | target_version > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 > > > > this would either reuiqre use to use a post placment sechudler filter to itrospec this data base > > or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to > > our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable > > alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each > > device * the number of possible compatible devices for that device. > > > > in other word if this is just opaque data we cant ever represent it efficently in our placment service and > > have to fall back to an explisive post placment schdluer filter base on the db table approch. > > > > this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable > > devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host > > with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU > > resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the > > CUSTOM_ > > trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that > > workflow. > > > > > > > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > > > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > > > SELF: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_4 > > > aggregator=2 > > > interface_version=1 > > > COMPATIBLE: > > > device_type=pci > > > device_id=8086591d > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > aggregator={val1}/2 > > > interface_version=1 > > > > by the way this is closer to yaml format then it is to json but it does not align with any exsiting > > format i know of so that just make the representation needless hard to consume > > if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > > > > > Notes: > > > - A COMPATIBLE object is a line starting with COMPATIBLE. > > > It specifies a list of compatible devices that are allowed to migrate > > > in. > > > The reason to allow multiple COMPATIBLE objects is that when it > > > is hard to express a complex compatible logic in one COMPATIBLE > > > object, a simple enumeration is still a fallback. > > > in the above example, device UUID2 is in the compatible list of > > > device UUID1, but device UUID1 is not in the compatible list of device > > > UUID2, so device UUID2 is able to migrate to device UUID1, but device > > > UUID1 is not able to migrate to device UUID2. > > > > > > - fields under each object are of "and" relationship to each other, meaning > > > all fields of SELF object of a target device must be equal to corresponding > > > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > > > compatible. > > > > > > - each field, however, is able to specify multiple allowed values, using > > > variables as explained below. > > > > > > - variables are represented with {}, the first appearance of one variable > > > specifies its type and allowed list. e.g. > > > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > > > values are 1, 2, 4, 8. > > > > > > - vendors are able to specify which fields are within the comparing list > > > and which fields are not. e.g. for physical VF migration, it may not > > > choose mdev_type as a comparing field, and maybe use driver name instead. > > > > this format might be useful to vendors but from a orcestrator perspecive i dont think this has > > value to us likely we would not use this api if it was added as it does not help us with schduling. > > ideally instead fo declaring which other mdev types a device is compatiable with (which could presumably change over > > time as new device and firmwares are released) i would prefer to see a declaritive non vendor specific api that > > declares > > the feature set provided by each mdev_type from which we can infer comaptiablity similar to cpu feature flags. > > for devices fo the same mdev_type name addtionally a declaritive version sting could also be used if required for > > addtional compatiablity checks. > > > > > > > > > Thanks > > > Yan > > > > > > > > From smooney at redhat.com Thu Jul 30 13:24:31 2020 From: smooney at redhat.com (Sean Mooney) Date: Thu, 30 Jul 2020 14:24:31 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200730034104.GB32327@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200729131255.68730f68@x1.home> <20200730034104.GB32327@joy-OptiPlex-7040> Message-ID: On Thu, 2020-07-30 at 11:41 +0800, Yan Zhao wrote: > > > > interface_version=3 > > > > Not much granularity here, I prefer Sean's previous > > .[.bugfix] scheme. > > > > yes, .[.bugfix] scheme may be better, but I'm not sure if > it works for a complicated scenario. > e.g for pv_mode, > (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, > (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, > indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. > (3) later, pv_mode=context is also supported, > pv_mode="none+ppgtt+context", so it's 0.2.0. > > But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to > name its version? it would become 1.0.0 addtion of a feature is a minor version bump as its backwards compatiable. if you dont request the new feature you dont need to use it and it can continue to behave like a 0.0.0 device evne if its capably of acting as a 0.1.0 device. when you remove a feature that is backward incompatable as any isnstance that was prevously not using it would nolonger work so you have to bump the major version. > "none+ppgtt" (0.1.0) is not compatible to > "none+context", but "none+ppgtt+context" (0.2.0) is compatible to > "none+context". > > Maintain such scheme is painful to vendor driver. not really its how most software libs are version today. some use other schemes but semantic versioning is don right is a concies and easy to consume set of rules https://semver.org/ however you are right that it forcnes vendor to think about backwards and forwards compatiablty with each change which for the most part is a good thing. it goes hand in hand with have stable abi and api definitons to ensuring firmware updates and driver chagnes dont break userspace that depend on the kernel interfaces they expose. From sean.mcginnis at gmx.com Thu Jul 30 15:51:34 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Thu, 30 Jul 2020 10:51:34 -0500 Subject: [ops] Reviving OSOps ? In-Reply-To: References: <20200716133127.GA31915@sync> <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> <118b071b-955c-8164-59d4-0c941e27c335@nemebean.com> <702d78f5-6db8-154e-03ae-6eee0e3dde4e@gmx.com> Message-ID: <5e80d023-f945-4b70-067b-aecc0664299c@gmx.com> I have proposed https://review.opendev.org/#/c/744005/ to expand the scope of the Operations Docs SIG to include tooling like this. Sean On 7/30/20 7:52 AM, Chris Morgan wrote: > +1 to put these in the Operations Docs SIG > > On Wed, Jul 29, 2020 at 12:25 AM Fabian Zimmermann > wrote: > > +1 > > Laurent Dumont > schrieb am Mi., 29. Juli 2020, > 04:00: > > Interested in this as well. We use Openstack a $Dayjob :) > > On Mon, Jul 27, 2020 at 2:52 PM Amy Marrich > wrote: > > +1 on combining this in with the existing SiG and efforts. > > Amy (spotz) > > On Mon, Jul 27, 2020 at 1:02 PM Sean McGinnis > > wrote: > > > >> If Osops should be considered distinct from OpenStack > > > > That feels like the wrong statement to make, even if > only implicitly > > by repo organization. Is there a compelling reason > not to have osops > > under the openstack namespace? > > > I think it makes the most sense to be under the > openstack namespace. > > We have the Operations Docs SIG right now that took on > some of the > operator-specific documentation that no longer had a > home. This was a > consistent issue brought up in the Ops Meetup events. > While not "wildly > successful" in getting a bunch of new and updated > docs, it at least has > accomplished the main goal of getting these docs > published to > docs.openstack.org again, > and providing a place where more collaboration > can (and occasionally does) happen to improve those docs. > > I think we could probably expand the scope of this > SIG. Especially > considering it is a pretty low-volume SIG anyway. I > would be good with > changing this to something like the "Operator Docs and > Tooling SIG" and > getting any of these useful tooling repos under > governance through that. > I personally wouldn't be able to spend a lot of time > working on anything > under the SIG, but I'd be happy to keep an eye out for > any new reviews > and help get those through. > > Sean > > > > > -- > Chris Morgan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Thu Jul 30 16:05:50 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 30 Jul 2020 09:05:50 -0700 Subject: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers Message-ID: Hello Octavia community, I would like to propose Ann Taraday (ataraday_) and Gregory Thiemonge (gthiemonge) as core reviewers on the Octavia project. Both Ann and Gregory have made significant contributions to the Octavia code base and have provided quality code reviews. Over the last two release cycles Ann has lead the addition of Taskflow jobboard support to the amphora v2 driver. Gregory has worked on improving our tempest scenario test coverage and enhancing the Octavia OpenStack client plugin. I think that both would make excellent additions to the Octavia core reviewer team. Existing Octavia core reviewers, please reply to this email with your support or concerns with adding Ann and Gregory to the core team. Michael From skaplons at redhat.com Thu Jul 30 16:06:55 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 30 Jul 2020 18:06:55 +0200 Subject: [neutron][networking-midonet] Maintainers needed In-Reply-To: <0AC5AC07-E97E-43CC-B344-A3E992B8CCA4@netways.de> References: <0AC5AC07-E97E-43CC-B344-A3E992B8CCA4@netways.de> Message-ID: <610412AF-AADF-44BD-ABA2-BA289B7C8F8A@redhat.com> Hi, Thx Sebastian for stepping in to maintain the project. That is great news. I think that at the beginning You should do 2 things: - sync with Takashi Yamamoto (I added him to the loop) as he is probably most active current maintainer of this project, - focus on fixing networking-midonet ci which is currently broken - all scenario jobs aren’t working fine on Ubuntu 18.04 (and we are going to move to 20.04 in this cycle), migrate jobs to zuulv3 from the legacy ones and finally add them to the ci again, I can of course help You with ci jobs if You need any help. Feel free to ping me on IRC or email (can be off the list). > On 29 Jul 2020, at 15:24, Sebastian Saemann wrote: > > Hi Slawek, > > we at NETWAYS are running most of our neutron networking on top of midonet and wouldn't be too happy if it gets deprecated and removed. So we would like to take over the maintainer role for this part. > > Please let me know how to proceed and how we can be onboarded easily. > > Best regards, > > Sebastian > > --  > Sebastian Saemann > Head of Managed Services > > NETWAYS Managed Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg > Tel: +49 911 92885-0 | Fax: +49 911 92885-77 > CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB25207 > https://netways.de | sebastian.saemann at netways.de > > ** NETWAYS Web Services - https://nws.netways.de ** — Slawek Kaplonski Principal software engineer Red Hat From cgoncalves at redhat.com Thu Jul 30 16:10:10 2020 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 30 Jul 2020 18:10:10 +0200 Subject: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers In-Reply-To: References: Message-ID: +1. Excellent contributions by both of them -- thank you! On Thu, Jul 30, 2020 at 6:07 PM Michael Johnson wrote: > Hello Octavia community, > > I would like to propose Ann Taraday (ataraday_) and Gregory Thiemonge > (gthiemonge) as core reviewers on the Octavia project. > > Both Ann and Gregory have made significant contributions to the > Octavia code base and have provided quality code reviews. Over the > last two release cycles Ann has lead the addition of Taskflow jobboard > support to the amphora v2 driver. Gregory has worked on improving our > tempest scenario test coverage and enhancing the Octavia OpenStack > client plugin. > > I think that both would make excellent additions to the Octavia core > reviewer team. > > Existing Octavia core reviewers, please reply to this email with your > support or concerns with adding Ann and Gregory to the core team. > > Michael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Thu Jul 30 16:11:02 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 30 Jul 2020 09:11:02 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Thank you for forwarding this Amy. Hi Moni, Can you check the Octavia API process log and the horizon log files to see what is causing the "Internal Server Error"? This sounds like a configuration file problem as Amy mentioned. There should be log messages in one of those two locations that will point to the problem. Michael On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich wrote: > > Adding the discuss list where you might get more help, but also double check your config file for any extra spaces or typos. > > Thanks, > > Amy (spotz) > > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal wrote: >> >> Hello All, >> >> I have been following https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html to deploy Octavia. I was successful in deployin Octavia but when I go to horizon dashboard and create loadbalancer am getting error "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo messaging was refusing connection I fixed it but still getting same error. Kindly help >> >> Regards, >> Moni >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community From german.eichberger at gmail.com Thu Jul 30 16:19:40 2020 From: german.eichberger at gmail.com (German Eichberger) Date: Thu, 30 Jul 2020 09:19:40 -0700 Subject: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers Message-ID: +1. Great to see some new people. Excellent work so far. Date: Thu, 30 Jul 2020 18:10:10 +0200 From: Carlos Goncalves To: Michael Johnson Cc: openstack-discuss Subject: Re: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers Message-ID: Content-Type: text/plain; charset="utf-8" +1. Excellent contributions by both of them -- thank you! On Thu, Jul 30, 2020 at 6:07 PM Michael Johnson wrote: > Hello Octavia community, > > I would like to propose Ann Taraday (ataraday_) and Gregory Thiemonge > (gthiemonge) as core reviewers on the Octavia project. > > Both Ann and Gregory have made significant contributions to the > Octavia code base and have provided quality code reviews. Over the > last two release cycles Ann has lead the addition of Taskflow jobboard > support to the amphora v2 driver. Gregory has worked on improving our > tempest scenario test coverage and enhancing the Octavia OpenStack > client plugin. > > I think that both would make excellent additions to the Octavia core > reviewer team. > > Existing Octavia core reviewers, please reply to this email with your > support or concerns with adding Ann and Gregory to the core team. > > Michael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Jul 30 16:34:14 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 30 Jul 2020 16:34:14 +0000 Subject: [manila][infra] Please delete some branches (was [manila] Please delete some branches) In-Reply-To: References: Message-ID: <20200730163414.tiyrkbeio6y6kajj@yuggoth.org> On 2020-07-29 14:39:55 -0700 (-0700), Goutham Pacha Ravi wrote: > I'd like to request the deletion of some branches in manila that have now > transitioned to EOL. These branches can be removed from openstack/manila, > openstack/python-manilaclient and openstack/manila-ui: > > stable/pike > stable/ocata > > I'd also like to request the deletion of "driverfixes" branches from the > openstack/manila repository. These branches were created to host vendor > fixes to branches that were no longer being tested; however, with our > "extended maintenance" stance, we've effectively removed the need for these > branches. These branches will no longer be maintained, and so they can be > removed as well: > > driverfixes/mitaka > driverfixes/newton > driverfixes/ocata [...] I have manually deleted the following branches: openstack/manila driverfixes/mitaka 5ea4d16ba971f95746f3938702d21bf2175b3974 driverfixes/newton 55f82afe85a9ec3f03e328f8297914e3b5ccf2f2 driverfixes/ocata 80ff530e420e1080f61b5562196f3e73aad8f12b stable/ocata 0e9b76abc1d612cb13fa70d1fcd787c851d7a28a stable/pike 58911882ff380421709e260b1a28c1525fb6761e openstack/manila-ui stable/ocata a66432796de8ba36e1f726cd6e059288b5477ba1 stable/pike a3d20831b0ed2415b7c7f869f8b2fdb341013944 openstack/python-manilaclient stable/ocata 77d5aa9c74745a1a3e243d3abe0fd304c44567d6 stable/pike 7f52051e0195a74b805121611cc07bca04e622a1 Please double-check that everything is still in order following these deletions. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gouthampravi at gmail.com Thu Jul 30 17:40:21 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 30 Jul 2020 10:40:21 -0700 Subject: [manila][infra] Please delete some branches (was [manila] Please delete some branches) In-Reply-To: <20200730163414.tiyrkbeio6y6kajj@yuggoth.org> References: <20200730163414.tiyrkbeio6y6kajj@yuggoth.org> Message-ID: On Thu, Jul 30, 2020 at 9:39 AM Jeremy Stanley wrote: > On 2020-07-29 14:39:55 -0700 (-0700), Goutham Pacha Ravi wrote: > > I'd like to request the deletion of some branches in manila that have now > > transitioned to EOL. These branches can be removed from openstack/manila, > > openstack/python-manilaclient and openstack/manila-ui: > > > > stable/pike > > stable/ocata > > > > I'd also like to request the deletion of "driverfixes" branches from the > > openstack/manila repository. These branches were created to host vendor > > fixes to branches that were no longer being tested; however, with our > > "extended maintenance" stance, we've effectively removed the need for > these > > branches. These branches will no longer be maintained, and so they can be > > removed as well: > > > > driverfixes/mitaka > > driverfixes/newton > > driverfixes/ocata > [...] > > I have manually deleted the following branches: > > openstack/manila > driverfixes/mitaka 5ea4d16ba971f95746f3938702d21bf2175b3974 > driverfixes/newton 55f82afe85a9ec3f03e328f8297914e3b5ccf2f2 > driverfixes/ocata 80ff530e420e1080f61b5562196f3e73aad8f12b > stable/ocata 0e9b76abc1d612cb13fa70d1fcd787c851d7a28a > stable/pike 58911882ff380421709e260b1a28c1525fb6761e > > openstack/manila-ui > stable/ocata a66432796de8ba36e1f726cd6e059288b5477ba1 > stable/pike a3d20831b0ed2415b7c7f869f8b2fdb341013944 > > openstack/python-manilaclient > stable/ocata 77d5aa9c74745a1a3e243d3abe0fd304c44567d6 > stable/pike 7f52051e0195a74b805121611cc07bca04e622a1 > > Please double-check that everything is still in order following > these deletions. > Thanks a lot, Jeremy. Everything looks great! > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.williamson at redhat.com Thu Jul 30 17:29:30 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Thu, 30 Jul 2020 11:29:30 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200730034104.GB32327@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> <20200717101258.65555978@x1.home> <20200721005113.GA10502@joy-OptiPlex-7040> <20200727072440.GA28676@joy-OptiPlex-7040> <20200727162321.7097070e@x1.home> <20200729080503.GB28676@joy-OptiPlex-7040> <20200729131255.68730f68@x1.home> <20200730034104.GB32327@joy-OptiPlex-7040> Message-ID: <20200730112930.6f4c5762@x1.home> On Thu, 30 Jul 2020 11:41:04 +0800 Yan Zhao wrote: > On Wed, Jul 29, 2020 at 01:12:55PM -0600, Alex Williamson wrote: > > On Wed, 29 Jul 2020 12:28:46 +0100 > > Sean Mooney wrote: > > > > > On Wed, 2020-07-29 at 16:05 +0800, Yan Zhao wrote: > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote: > > > > > On Mon, 27 Jul 2020 15:24:40 +0800 > > > > > Yan Zhao wrote: > > > > > > > > > > > > > As you indicate, the vendor driver is responsible for checking version > > > > > > > > information embedded within the migration stream. Therefore a > > > > > > > > migration should fail early if the devices are incompatible. Is it > > > > > > > > > > > > > > but as I know, currently in VFIO migration protocol, we have no way to > > > > > > > get vendor specific compatibility checking string in migration setup stage > > > > > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > > > > > In this way, for devices who does not save device data in precopy stage, > > > > > > > the migration compatibility checking is as late as in stop-and-copy > > > > > > > stage, which is too late. > > > > > > > do you think we need to add the getting/checking of vendor specific > > > > > > > compatibility string early in save_setup stage? > > > > > > > > > > > > > > > > > > > hi Alex, > > > > > > after an offline discussion with Kevin, I realized that it may not be a > > > > > > problem if migration compatibility check in vendor driver occurs late in > > > > > > stop-and-copy phase for some devices, because if we report device > > > > > > compatibility attributes clearly in an interface, the chances for > > > > > > libvirt/openstack to make a wrong decision is little. > > > > > > > > > > I think it would be wise for a vendor driver to implement a pre-copy > > > > > phase, even if only to send version information and verify it at the > > > > > target. Deciding you have no device state to send during pre-copy does > > > > > not mean your vendor driver needs to opt-out of the pre-copy phase > > > > > entirely. Please also note that pre-copy is at the user's discretion, > > > > > we've defined that we can enter stop-and-copy at any point, including > > > > > without a pre-copy phase, so I would recommend that vendor drivers > > > > > validate compatibility at the start of both the pre-copy and the > > > > > stop-and-copy phases. > > > > > > > > > > > > > ok. got it! > > > > > > > > > > so, do you think we are now arriving at an agreement that we'll give up > > > > > > the read-and-test scheme and start to defining one interface (perhaps in > > > > > > json format), from which libvirt/openstack is able to parse and find out > > > > > > compatibility list of a source mdev/physical device? > > > > > > > > > > Based on the feedback we've received, the previously proposed interface > > > > > is not viable. I think there's agreement that the user needs to be > > > > > able to parse and interpret the version information. Using json seems > > > > > viable, but I don't know if it's the best option. Is there any > > > > > precedent of markup strings returned via sysfs we could follow? > > > > > > > > I found some examples of using formatted string under /sys, mostly under > > > > tracing. maybe we can do a similar implementation. > > > > > > > > #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format > > > > > > > > name: kvm_mmio > > > > ID: 32 > > > > format: > > > > field:unsigned short common_type; offset:0; size:2; signed:0; > > > > field:unsigned char common_flags; offset:2; size:1; signed:0; > > > > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > > > > field:int common_pid; offset:4; size:4; signed:1; > > > > > > > > field:u32 type; offset:8; size:4; signed:0; > > > > field:u32 len; offset:12; size:4; signed:0; > > > > field:u64 gpa; offset:16; size:8; signed:0; > > > > field:u64 val; offset:24; size:8; signed:0; > > > > > > > > print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" > > > > }, { 2, "write" }), REC->len, REC->gpa, REC->val > > > > > > > this is not json fromat and its not supper frendly to parse. > > > > > > > > #cat /sys/devices/pci0000:00/0000:00:02.0/uevent > > > > DRIVER=vfio-pci > > > > PCI_CLASS=30000 > > > > PCI_ID=8086:591D > > > > PCI_SUBSYS_ID=8086:2212 > > > > PCI_SLOT_NAME=0000:00:02.0 > > > > MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00 > > > > > > > this is ini format or conf formant > > > this is pretty simple to parse whichi would be fine. > > > that said you could also have a version or capablitiy directory with a file > > > for each key and a singel value. > > > > > > i would prefer to only have to do one read personally the list the files in > > > directory and then read tehm all ot build the datastucture myself but that is > > > doable though the simple ini format use d for uevent seams the best of 3 options > > > provided above. > > > > > > > > > > Your idea of having both a "self" object and an array of "compatible" > > > > > objects is perhaps something we can build on, but we must not assume > > > > > PCI devices at the root level of the object. Providing both the > > > > > mdev-type and the driver is a bit redundant, since the former includes > > > > > the latter. We can't have vendor specific versioning schemes though, > > > > > ie. gvt-version. We need to agree on a common scheme and decide which > > > > > fields the version is relative to, ex. just the mdev type? > > > > > > > > what about making all comparing fields vendor specific? > > > > userspace like openstack only needs to parse and compare if target > > > > device is within source compatible list without understanding the meaning > > > > of each field. > > > that kind of defeats the reason for having them be be parsable. > > > the reason openstack want to be able to understand the capablitys is so > > > we can staticaly declare the capablit of devices ahead of time on so our schduler > > > can select host based on that. is the keys and data are opaquce to userspace > > > becaue they are just random vendor sepecific blobs we cant do that. > > > > Agreed, I'm not sure I'm willing to rule out that there could be vendor > > specific direct match fields, as I included in my example earlier in > > the thread, but entirely vendor specific defeats much of the purpose > > here. > > > > > > > I had also proposed fields that provide information to create a > > > > > compatible type, for example to create a type_x2 device from a type_x1 > > > > > mdev type, they need to know to apply an aggregation attribute. If we > > > > > need to explicitly list every aggregation value and the resulting type, > > > > > I think we run aground of what aggregation was trying to avoid anyway, > > > > > so we might need to pick a language that defines variable substitution > > > > > or some kind of tagging. For example if we could define ${aggr} as an > > > > > integer within a specified range, then we might be able to define a type > > > > > relative to that value (type_x${aggr}) which requires an aggregation > > > > > attribute using the same value. I dunno, just spit balling. Thanks, > > > > > > > > what about a migration_compatible attribute under device node like > > > > below? > > > rather then listing comaptiable devices it would be better if you could declaritivly > > > list the feature supported and we could compare those along with a simple semver version string. > > > > > > > > #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible > > > > Note that we're defining compatibility relative to a vfio migration > > interface, so we should include that in the name, we don't know what > > other migration interfaces might exist. > do you mean we need to name it as vfio_migration, e.g. > /sys/bus/pci/devices/0000\:00\:02.0/UUID1/vfio_migration ? > > > > > > SELF: > > > > device_type=pci > > > > Why not the device_api here, ie. vfio-pci. The device doesn't provide > > a pci interface directly, it's wrapped in a vfio API. > > > the device_type is to indicate below device_id is a pci id. > > yes, include a device_api field is better. > for mdev, "device_type=vfio-mdev", is it right? No, vfio-mdev is not a device API, it's the driver that attaches to the mdev bus device to expose it through vfio. The device_api exposes the actual interface of the vfio device, it's also vfio-pci for typical mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc... See VFIO_DEVICE_API_PCI_STRING and friends. > > > > device_id=8086591d > > > > Is device_id interpreted relative to device_type? How does this > > relate to mdev_type? If we have an mdev_type, doesn't that fully > > defined the software API? > > > it's parent pci id for mdev actually. If we need to specify the parent PCI ID then something is fundamentally wrong with the mdev_type. The mdev_type should define a unique, software compatible interface, regardless of the parent device IDs. If a i915-GVTg_V5_2 means different things based on the parent device IDs, then then different mdev_types should be reported for those parent devices. > > > > mdev_type=i915-GVTg_V5_2 > > > > And how are non-mdev devices represented? > > > non-mdev can opt to not include this field, or as you said below, a > vendor signature. > > > > > aggregator=1 > > > > pv_mode="none+ppgtt+context" > > > > These are meaningless vendor specific matches afaict. > > > yes, pv_mode and aggregator are vendor specific fields. > but they are important to decide whether two devices are compatible. > pv_mode means whether a vGPU supports guest paravirtualized api. > "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or > use context mode pv. > > > > > interface_version=3 > > > > Not much granularity here, I prefer Sean's previous > > .[.bugfix] scheme. > > > yes, .[.bugfix] scheme may be better, but I'm not sure if > it works for a complicated scenario. > e.g for pv_mode, > (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, > (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, > indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. > (3) later, pv_mode=context is also supported, > pv_mode="none+ppgtt+context", so it's 0.2.0. > > But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to > name its version? "none+ppgtt" (0.1.0) is not compatible to > "none+context", but "none+ppgtt+context" (0.2.0) is compatible to > "none+context". If pv_mode=ppgtt is removed, then the compatible versions would be 0.0.0 or 1.0.0, ie. the major version would be incremented due to feature removal. > Maintain such scheme is painful to vendor driver. Migration compatibility is painful, there's no way around that. I think the version scheme is an attempt to push some of that low level burden on the vendor driver, otherwise the management tools need to work on an ever growing matrix of vendor specific features which is going to become unwieldy and is largely meaningless outside of the vendor driver. Instead, the vendor driver can make strategic decisions about where to continue to maintain a support burden and make explicit decisions to maintain or break compatibility. The version scheme is a simplification and abstraction of vendor driver features in order to create a small, logical compatibility matrix. Compromises necessarily need to be made for that to occur. > > > > COMPATIBLE: > > > > device_type=pci > > > > device_id=8086591d > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > this mixed notation will be hard to parse so i would avoid that. > > > > Some background, Intel has been proposing aggregation as a solution to > > how we scale mdev devices when hardware exposes large numbers of > > assignable objects that can be composed in essentially arbitrary ways. > > So for instance, if we have a workqueue (wq), we might have an mdev > > type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a > > discrete mdev type for each of those, so they want to define a base > > type which is composable to other types via this aggregation. This is > > what this substitution and tagging is attempting to accomplish. So > > imagine this set of values for cases where it's not practical to unroll > > the values for N discrete types. > > > > > > aggregator={val1}/2 > > > > So the {val1} above would be substituted here, though an aggregation > > factor of 1/2 is a head scratcher... > > > > > > pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} > > > > I'm lost on this one though. I think maybe it's indicating that it's > > compatible with any of these, so do we need to list it? Couldn't this > > be handled by Sean's version proposal where the minor version > > represents feature compatibility? > yes, it's indicating that it's compatible with any of these. > Sean's version proposal may also work, but it would be painful for > vendor driver to maintain the versions when multiple similar features > are involved. This is something vendor drivers need to consider when adding and removing features. > > > > interface_version={val3:int:2,3} > > > > What does this turn into in a few years, 2,7,12,23,75,96,... > > > is a range better? I was really trying to point out that sparseness becomes an issue if the vendor driver is largely disconnected from how their feature addition and deprecation affects migration support. Thanks, Alex > > > > COMPATIBLE: > > > > device_type=pci > > > > device_id=8086591d > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > aggregator={val1}/2 > > > > pv_mode="" #"" meaning empty, could be absent in a compatible device > > > > interface_version=1 > > > > Why can't this be represented within the previous compatible > > description? > > > actually it can be merged with the previous one :) > But I guess there must be one that cannot merge, so put it as an > example to demo multiple COMPATIBLE objects. > > Thanks > Yan > > > > if you presented this information the only way i could see to use it would be to > > > extract the mdev_type name and interface_vertion and build a database table as follows > > > > > > source_mdev_type | source_version | target_mdev_type | target_version > > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | {val3:int:2,3} > > > i915-GVTg_V5_2 | 3 | 915-GVTg_V5_{val1:int:1,2,4,8} | 1 > > > > > > this would either reuiqre use to use a post placment sechudler filter to itrospec this data base > > > or thansform the target_mdev_type and target_version colum data into CUSTOM_* traits we apply to > > > our placment resouce providers and we would have to prefrom multiple reuqest for each posible compatiable > > > alternitive. if the vm has muplite mdevs this is combinatorially problmenatic as it is 1 query for each > > > device * the number of possible compatible devices for that device. > > > > > > in other word if this is just opaque data we cant ever represent it efficently in our placment service and > > > have to fall back to an explisive post placment schdluer filter base on the db table approch. > > > > > > this also ignore the fact that at present the mdev_type cannot change druing a migration so the compatiable > > > devicve with a different mdev type would not be considerd accpetable choice in openstack. they way you select a host > > > with a specific vgpu mdev type today is to apply a custome trait which is CUSTOM_ to the vGPU > > > resouce provider and then in the flavor you request 1 allcoaton of vGPU and require the CUSTOM_ > > > trait. so going form i915-GVTg_V5_2 to i915-GVTg_V5_{val1:int:1,2,4,8} would not currently be compatiable with that > > > workflow. > > > > The latter would need to be parsed into: > > > > i915-GVTg_V5_1 > > i915-GVTg_V5_2 > > i915-GVTg_V5_4 > > i915-GVTg_V5_8 > > > > There is also on the table, migration from physical devices to mdev > > devices (or vice versa), which is not represented in these examples, > > nor do I see how we'd represent it. This is where I started exposing > > the resulting PCI device from the mdev in my example so we could have > > some commonality between devices, but the migration stream provider is > > just as important as the type of device, we could have different host > > drivers providing the same device with incompatible migration streams. > > The mdev_type encompasses both the driver and device, but we wouldn't > > have mdev_types for physical devices, per our current thinking. > > > > > > > > #cat /sys/bus/pci/dei915-GVTg_V5_{val1:int:1,2,4,8}vices/0000\:00\:i915- > > > > GVTg_V5_{val1:int:1,2,4,8}2.0/UUID2/migration_compatible > > > > SELF: > > > > device_type=pci > > > > device_id=8086591d > > > > mdev_type=i915-GVTg_V5_4 > > > > aggregator=2 > > > > interface_version=1 > > > > COMPATIBLE: > > > > device_type=pci > > > > device_id=8086591d > > > > mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} > > > > aggregator={val1}/2 > > > > interface_version=1 > > > by the way this is closer to yaml format then it is to json but it does not align with any exsiting > > > format i know of so that just make the representation needless hard to consume > > > if we are going to use a markup lanag let use a standard one like yaml json or toml and not invent a new one. > > > > > > > > Notes: > > > > - A COMPATIBLE object is a line starting with COMPATIBLE. > > > > It specifies a list of compatible devices that are allowed to migrate > > > > in. > > > > The reason to allow multiple COMPATIBLE objects is that when it > > > > is hard to express a complex compatible logic in one COMPATIBLE > > > > object, a simple enumeration is still a fallback. > > > > in the above example, device UUID2 is in the compatible list of > > > > device UUID1, but device UUID1 is not in the compatible list of device > > > > UUID2, so device UUID2 is able to migrate to device UUID1, but device > > > > UUID1 is not able to migrate to device UUID2. > > > > > > > > - fields under each object are of "and" relationship to each other, meaning > > > > all fields of SELF object of a target device must be equal to corresponding > > > > fields of a COMPATIBLE object of source device, otherwise it is regarded as not > > > > compatible. > > > > > > > > - each field, however, is able to specify multiple allowed values, using > > > > variables as explained below. > > > > > > > > - variables are represented with {}, the first appearance of one variable > > > > specifies its type and allowed list. e.g. > > > > {val1:int:1,2,4,8} represents var1 whose type is integer and allowed > > > > values are 1, 2, 4, 8. > > > > > > > > - vendors are able to specify which fields are within the comparing list > > > > and which fields are not. e.g. for physical VF migration, it may not > > > > choose mdev_type as a comparing field, and maybe use driver name instead. > > > this format might be useful to vendors but from a orcestrator > > > perspecive i dont think this has value to us likely we would not use > > > this api if it was added as it does not help us with schduling. > > > ideally instead fo declaring which other mdev types a device is > > > compatiable with (which could presumably change over time as new > > > device and firmwares are released) i would prefer to see a > > > declaritive non vendor specific api that declares the feature set > > > provided by each mdev_type from which we can infer comaptiablity > > > similar to cpu feature flags. for devices fo the same mdev_type name > > > addtionally a declaritive version sting could also be used if > > > required for addtional compatiablity checks. > > > > "non vendor specific api that declares the feature set", aren't > > features generally vendor specific? What we're trying to describe is, > > by it's very nature, vendor specific. We don't have an ISO body > > defining a graphics adapter and enumerating features for that adapter. > > I think what we have is mdev_types. Each type is supposed to define a > > specific software interface, perhaps even more so than is done by a PCI > > vendor:device ID. Maybe that mdev_type needs to be abstracted as > > something more like a vendor signature, such that a physical device > > could provide or accept a vendor signature that's compatible with an > > mdev device. For example, a physically assigned Intel GPU might expose > > a migration signature of i915-GVTg_v5_8 if it were designed to be > > compatible with that mdev_type. Thanks, > > > > Alex > > > From monika.samal at outlook.com Thu Jul 30 19:34:25 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 30 Jul 2020 19:34:25 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hello Michael and Amy, As suggested I checked and found below error on Octavia-api logs, but not sure how to resolve and proceed further with this. [cid:a29109eb-08ac-4f9a-92d4-2befba2cc90e] Regards, Moni ________________________________ From: Michael Johnson Sent: Thursday, July 30, 2020 9:41 PM To: Amy Marrich Cc: Monika Samal ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thank you for forwarding this Amy. Hi Moni, Can you check the Octavia API process log and the horizon log files to see what is causing the "Internal Server Error"? This sounds like a configuration file problem as Amy mentioned. There should be log messages in one of those two locations that will point to the problem. Michael On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich wrote: > > Adding the discuss list where you might get more help, but also double check your config file for any extra spaces or typos. > > Thanks, > > Amy (spotz) > > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal wrote: >> >> Hello All, >> >> I have been following https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html to deploy Octavia. I was successful in deployin Octavia but when I go to horizon dashboard and create loadbalancer am getting error "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo messaging was refusing connection I fixed it but still getting same error. Kindly help >> >> Regards, >> Moni >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64201 bytes Desc: image.png URL: From dev.faz at gmail.com Thu Jul 30 19:57:28 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 30 Jul 2020 21:57:28 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: service_auth keystone_authtoken if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? Fabian Fabian Zimmermann schrieb am Do., 30. Juli 2020, 21:52: > Hi, > > check your keystone auth settings. > > If i remember correctly there are two different sections to which require > valid keystone settings. > > Fabian > > Monika Samal schrieb am Do., 30. Juli 2020, > 21:43: > >> Hello Michael and Amy, >> >> As suggested I checked and found below error on Octavia-api logs, but not >> sure how to resolve and proceed further with this. >> >> >> Regards, >> Moni >> ------------------------------ >> *From:* Michael Johnson >> *Sent:* Thursday, July 30, 2020 9:41 PM >> *To:* Amy Marrich >> *Cc:* Monika Samal ; openstack-discuss < >> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >> community at lists.openstack.org> >> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >> balancer >> >> Thank you for forwarding this Amy. >> >> Hi Moni, >> >> Can you check the Octavia API process log and the horizon log files to >> see what is causing the "Internal Server Error"? >> >> This sounds like a configuration file problem as Amy mentioned. There >> should be log messages in one of those two locations that will point >> to the problem. >> >> Michael >> >> On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich wrote: >> > >> > Adding the discuss list where you might get more help, but also double >> check your config file for any extra spaces or typos. >> > >> > Thanks, >> > >> > Amy (spotz) >> > >> > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal >> wrote: >> >> >> >> Hello All, >> >> >> >> I have been following >> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html >> to deploy Octavia. I was successful in deployin Octavia but when I go to >> horizon dashboard and create loadbalancer am getting error >> "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked >> worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo >> messaging was refusing connection I fixed it but still getting same error. >> Kindly help >> >> >> >> Regards, >> >> Moni >> >> _______________________________________________ >> >> Community mailing list >> >> Community at lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Thu Jul 30 20:08:33 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 30 Jul 2020 22:08:33 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: The sections should be service_auth keystone_authtoken if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Thu Jul 30 20:09:01 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 30 Jul 2020 13:09:01 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Yep, check the [service_auth], [keystone_authtoken], and [neutron] sections to make sure they are configured correctly for your environment/keystone setup. An example configuration file from our testing gates is here: https://6c99932face18597ac21-e78425eb2af28756e9f2701e93cfe670.ssl.cf1.rackcdn.com/738292/9/check/octavia-v2-dsvm-scenario/abac8cf/controller/logs/etc/octavia/octavia_conf.txt The documentation for the settings is here: https://docs.openstack.org/octavia/latest/configuration/configref.html Michael On Thu, Jul 30, 2020 at 12:57 PM Fabian Zimmermann wrote: > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > Fabian > > Fabian Zimmermann schrieb am Do., 30. Juli 2020, > 21:52: > >> Hi, >> >> check your keystone auth settings. >> >> If i remember correctly there are two different sections to which require >> valid keystone settings. >> >> Fabian >> >> Monika Samal schrieb am Do., 30. Juli 2020, >> 21:43: >> >>> Hello Michael and Amy, >>> >>> As suggested I checked and found below error on Octavia-api logs, but >>> not sure how to resolve and proceed further with this. >>> >>> >>> Regards, >>> Moni >>> ------------------------------ >>> *From:* Michael Johnson >>> *Sent:* Thursday, July 30, 2020 9:41 PM >>> *To:* Amy Marrich >>> *Cc:* Monika Samal ; openstack-discuss < >>> openstack-discuss at lists.openstack.org>; community at lists.openstack.org < >>> community at lists.openstack.org> >>> *Subject:* Re: [openstack-community] Octavia :; Unable to create load >>> balancer >>> >>> Thank you for forwarding this Amy. >>> >>> Hi Moni, >>> >>> Can you check the Octavia API process log and the horizon log files to >>> see what is causing the "Internal Server Error"? >>> >>> This sounds like a configuration file problem as Amy mentioned. There >>> should be log messages in one of those two locations that will point >>> to the problem. >>> >>> Michael >>> >>> On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich wrote: >>> > >>> > Adding the discuss list where you might get more help, but also double >>> check your config file for any extra spaces or typos. >>> > >>> > Thanks, >>> > >>> > Amy (spotz) >>> > >>> > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal >>> wrote: >>> >> >>> >> Hello All, >>> >> >>> >> I have been following >>> https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html >>> to deploy Octavia. I was successful in deployin Octavia but when I go to >>> horizon dashboard and create loadbalancer am getting error >>> "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked >>> worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo >>> messaging was refusing connection I fixed it but still getting same error. >>> Kindly help >>> >> >>> >> Regards, >>> >> Moni >>> >> _______________________________________________ >>> >> Community mailing list >>> >> Community at lists.openstack.org >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Thu Jul 30 20:27:25 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Thu, 30 Jul 2020 22:27:25 +0200 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Hi, just to debug, could you replace the auth_type password with v3password? And do a curl against your :5000 and :35357 urls and paste the output. Fabian Monika Samal schrieb am Do., 30. Juli 2020, 22:15: > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ------------------------------ > *From:* Fabian Zimmermann > *Sent:* Friday, July 31, 2020 1:38 AM > *To:* Monika Samal > *Cc:* Michael Johnson ; Amy Marrich ; > openstack-discuss ; > community at lists.openstack.org > *Subject:* Re: [openstack-community] Octavia :; Unable to create load > balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config > (remove/change passwords) to paste.openstack.org and post the link? > > Fabian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Thu Jul 30 20:52:40 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 30 Jul 2020 16:52:40 -0400 Subject: [tc] monthly meeting Message-ID: Hi everyone, Our monthly TC meeting is scheduled for next Thursday, August 6th, at 1400 UTC. If you would like to add topics for discussion, please go to https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting and fill out your suggestions by Wednesday, August 5, at 1900 UTC. Thank you, Regards, -- Mohammed Naser VEXXHOST, Inc. From jimmy at openstack.org Thu Jul 30 22:19:46 2020 From: jimmy at openstack.org (Jimmy McArthur) Date: Thu, 30 Jul 2020 17:19:46 -0500 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <1fd2cd70-1b9b-47d1-9236-97673247f295@debian.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> <4e722cd6-9ff7-ac76-03f3-61c352d96801@openstack.org> <1fd2cd70-1b9b-47d1-9236-97673247f295@debian.org> Message-ID: <6aeef216-01c2-c017-5b13-b0baebfe0d92@openstack.org> Thomas, Should be all set on this one too.  Thanks again for the report! Cheers, Jimmy Thomas Goirand wrote on 7/28/20 2:04 AM: > On 7/28/20 1:23 AM, Jimmy McArthur wrote: >> Jimmy McArthur wrote on 7/27/20 10:54 AM: >>> That does indeed sound like a bug.  Let me test with your account and >>> I'll update ASAP. >>> >>> Cheers, >>> Jimmy >>> >>> Thomas Goirand wrote on 7/27/20 2:08 AM: >>>> Well, there's a bug then... >>>> >>>> When I got to: >>>> https://cfp.openstack.org/app/profile >>>> >>>> under the Email field, it displays tho%2A%2A%40goirand.fr which I cannot >>>> edit. Then when I click on SAVE, I'm being told that the email isn't a >>>> valid one (but I cannot edit it...). >>>> >>>> As a result, I can never save my updated bio... >> Thomas, >> >> We've pushed a fix for this. Please let us know if you have any further >> trouble. >> >> Thank you! >> Jimmy > This worked, thanks Jimmy! > > One last bug though: I can't select anything in "What is your current > Organizational Role at your company? (check all that apply):" (ie: when > I click, nothing happens... checkboxes stay untick). > > Cheers, > > Thomas Goirand (zigo) > From johnsomor at gmail.com Thu Jul 30 22:27:01 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 30 Jul 2020 15:27:01 -0700 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: Message-ID: Just to close the loop on this, the octavia.conf file had "project_name = admin" instead of "project_name = service" in the [service_auth] section. This was causing the keystone errors when Octavia was communicating with neutron. I don't know if that is a bug in kolla-ansible or was just a local configuration issue. Michael On Thu, Jul 30, 2020 at 1:39 PM Monika Samal wrote: > > Hello Fabian,, > > http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ > > Regards, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:57 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > Hi, > > just to debug, could you replace the auth_type password with v3password? > > And do a curl against your :5000 and :35357 urls and paste the output. > > Fabian > > Monika Samal schrieb am Do., 30. Juli 2020, 22:15: > > Hello Fabian, > > http://paste.openstack.org/show/796477/ > > Thanks, > Monika > ________________________________ > From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:38 AM > To: Monika Samal > Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer > > The sections should be > > service_auth > keystone_authtoken > > if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? > > Fabian From monika.samal at outlook.com Thu Jul 30 19:41:42 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 30 Jul 2020 19:41:42 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , , Message-ID: ++ In addition to my previous mail I also checked keystone.log, below is the Screenshot [cid:04102590-befc-41c8-bae1-76e9e787749a] ________________________________ From: Monika Samal Sent: Friday, July 31, 2020 1:04 AM To: Michael Johnson ; Amy Marrich Cc: openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Hello Michael and Amy, As suggested I checked and found below error on Octavia-api logs, but not sure how to resolve and proceed further with this. [cid:a29109eb-08ac-4f9a-92d4-2befba2cc90e] Regards, Moni ________________________________ From: Michael Johnson Sent: Thursday, July 30, 2020 9:41 PM To: Amy Marrich Cc: Monika Samal ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thank you for forwarding this Amy. Hi Moni, Can you check the Octavia API process log and the horizon log files to see what is causing the "Internal Server Error"? This sounds like a configuration file problem as Amy mentioned. There should be log messages in one of those two locations that will point to the problem. Michael On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich wrote: > > Adding the discuss list where you might get more help, but also double check your config file for any extra spaces or typos. > > Thanks, > > Amy (spotz) > > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal wrote: >> >> Hello All, >> >> I have been following https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html to deploy Octavia. I was successful in deployin Octavia but when I go to horizon dashboard and create loadbalancer am getting error "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo messaging was refusing connection I fixed it but still getting same error. Kindly help >> >> Regards, >> Moni >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 64201 bytes Desc: image.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 30116 bytes Desc: image.png URL: From monika.samal at outlook.com Thu Jul 30 19:55:07 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 30 Jul 2020 19:55:07 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hey Fabian, Can you please elaborate more about keystone settings, where in need to make changes etc.. I am new to openstack finding it difficult to resolve kind of atuck ________________________________ From: Fabian Zimmermann Sent: Friday, July 31, 2020 1:22 AM To: Monika Samal Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Hi, check your keystone auth settings. If i remember correctly there are two different sections to which require valid keystone settings. Fabian Monika Samal > schrieb am Do., 30. Juli 2020, 21:43: Hello Michael and Amy, As suggested I checked and found below error on Octavia-api logs, but not sure how to resolve and proceed further with this. [cid:a29109eb-08ac-4f9a-92d4-2befba2cc90e] Regards, Moni ________________________________ From: Michael Johnson > Sent: Thursday, July 30, 2020 9:41 PM To: Amy Marrich > Cc: Monika Samal >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Thank you for forwarding this Amy. Hi Moni, Can you check the Octavia API process log and the horizon log files to see what is causing the "Internal Server Error"? This sounds like a configuration file problem as Amy mentioned. There should be log messages in one of those two locations that will point to the problem. Michael On Thu, Jul 30, 2020 at 6:10 AM Amy Marrich > wrote: > > Adding the discuss list where you might get more help, but also double check your config file for any extra spaces or typos. > > Thanks, > > Amy (spotz) > > On Thu, Jul 30, 2020 at 6:30 AM Monika Samal > wrote: >> >> Hello All, >> >> I have been following https://docs.openstack.org/kolla-ansible/latest/reference/networking/octavia.html to deploy Octavia. I was successful in deployin Octavia but when I go to horizon dashboard and create loadbalancer am getting error "9876/v2.0/lbaas/loadbalancers, Internal Server Error". I have checked worker log at /var/log/kola-ansible/Octavia-worker.log and found oslo messaging was refusing connection I fixed it but still getting same error. Kindly help >> >> Regards, >> Moni >> _______________________________________________ >> Community mailing list >> Community at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/community -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Thu Jul 30 20:15:51 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 30 Jul 2020 20:15:51 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hello Fabian, http://paste.openstack.org/show/796477/ Thanks, Monika ________________________________ From: Fabian Zimmermann Sent: Friday, July 31, 2020 1:38 AM To: Monika Samal Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer The sections should be service_auth keystone_authtoken if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From monika.samal at outlook.com Thu Jul 30 20:39:46 2020 From: monika.samal at outlook.com (Monika Samal) Date: Thu, 30 Jul 2020 20:39:46 +0000 Subject: [openstack-community] Octavia :; Unable to create load balancer In-Reply-To: References: , Message-ID: Hello Fabian,, http://paste.openstack.org/show/QxKv2Ai697qulp9UWTjY/ Regards, Monika ________________________________ From: Fabian Zimmermann Sent: Friday, July 31, 2020 1:57 AM To: Monika Samal Cc: Michael Johnson ; Amy Marrich ; openstack-discuss ; community at lists.openstack.org Subject: Re: [openstack-community] Octavia :; Unable to create load balancer Hi, just to debug, could you replace the auth_type password with v3password? And do a curl against your :5000 and :35357 urls and paste the output. Fabian Monika Samal > schrieb am Do., 30. Juli 2020, 22:15: Hello Fabian, http://paste.openstack.org/show/796477/ Thanks, Monika ________________________________ From: Fabian Zimmermann > Sent: Friday, July 31, 2020 1:38 AM To: Monika Samal > Cc: Michael Johnson >; Amy Marrich >; openstack-discuss >; community at lists.openstack.org > Subject: Re: [openstack-community] Octavia :; Unable to create load balancer The sections should be service_auth keystone_authtoken if i read the docs correctly. Maybe you can just paste your config (remove/change passwords) to paste.openstack.org and post the link? Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngtech1ltd at gmail.com Thu Jul 30 21:51:49 2020 From: ngtech1ltd at gmail.com (Eliezer Croitor) Date: Fri, 31 Jul 2020 00:51:49 +0300 Subject: Looking for recommendation what OS to use for a minimal installation In-Reply-To: References: <000301d66568$4a80d690$df8283b0$@gmail.com> Message-ID: <003e01d666bb$a1cf1930$e56d4b90$@gmail.com> Now I have hit another bug with Mariadb 10.4.13: https://jira.mariadb.org/browse/MDEV-22563 I am now amazed by these bugs.. Eliezer ---- Eliezer Croitoru Tech Support Mobile: +972-5-28704261 Email: ngtech1ltd at gmail.com From: Fabian Zimmermann Sent: Wednesday, July 29, 2020 5:24 PM To: Eliezer Croitor Cc: openstack-discuss Subject: Re: Looking for recommendation what OS to use for a minimal installation Hi, Eliezer Croitor > schrieb am Mi., 29. Juli 2020, 15:30: Hey Everybody, Any recommendation where to start? Would suggest to use kolla-ansible, but this needs a bit container / ansible knowhow. What OS to use? CentOS seems fine, but im using Ubuntu. Should not be an issue at all if you use containers. Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Jul 31 00:25:13 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 30 Jul 2020 20:25:13 -0400 Subject: [Glance] Proposing Dan Smith for glance core In-Reply-To: References: Message-ID: <1b292658-2c78-b180-6fa8-19eafc1b49ed@gmail.com> On 7/30/20 11:25 AM, Abhishek Kekane wrote: > Hi All, > > I'd like to propose adding Dan Smith to the glance core group. > > Dan Smith has contributed to stabilize image import workflow as well as > multiple stores of glance. > He is also contributing in tempest and nova to set up CI/tempest jobs > around image import and multiple stores. > > Being involved on the mailing-list and IRC channels, Dan is always > helpful to the community and here to help. > > Please respond with +1/-1 until 03rd August,2020 1400 UTC. Dan's been doing some great work for Glance. +1 from me. > > Cheers, > Abhishek From rosmaita.fossdev at gmail.com Fri Jul 31 00:27:32 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 30 Jul 2020 20:27:32 -0400 Subject: [cinder] new driver merge deadline has passed Message-ID: <2ff0b3cb-c54f-ca42-93dc-9796efa0c15b@gmail.com> This is a notification that the Cinder new driver merge deadline for Victoria has passed [0]. At yesterday's Cinder team meeting, we discussed Walt's work on the ceph-iscsi driver [1], and the team came to a consensus that we will extend the deadline *for this driver only* into the R-9 week (cinder midcycle meeting 2 week), which will make the deadline for the ceph-iscsi driver Thursday 13 August 23:59 UTC. This is an extraordinary step, as traditionally Cinder has strictly enforced the new driver merge deadline. The reasons for this exception are: - this is a community driver - the check/gate jobs for this driver will be run from the cinder zuul configuration, not a third-party CI system, so the cinder team has full transparency into the working state of the CI system I want to emphasize again that this is an extraordinary event, and as such, should not be expected to set a precedent beyond the Victoria development cycle. And, of course, this deadline extension is not a guarantee that the driver will be included in the Victoria release. Thank you for your attention to this matter, brian [0] https://releases.openstack.org/victoria/schedule.html#v-cinder-driver-deadline [1] https://review.opendev.org/#/q/(topic:ceph-iscsi-zuul+OR+topic:ceph-iscsi)+(status:open+OR+status:merged) From lijie at unitedstack.com Fri Jul 31 09:23:25 2020 From: lijie at unitedstack.com (=?utf-8?B?UmFtYm8=?=) Date: Fri, 31 Jul 2020 17:23:25 +0800 Subject: [nova] If any spec freeze exception now? Message-ID: Hi,all:         I have a spec which is support volume backed server rebuild[0].This spec was accepted in Stein, but some of the work did not finish, so repropose it for Victoria.And this spec is depend on the cinder reimage api [1], now the reimage api is almost all completed. So I sincerely wish this spec will approved in Victoria. If this spec is approved, I will achieve it at once. Ref: [0]:https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild [1]:https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api Best Regards Rambo -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Jul 31 10:54:02 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 31 Jul 2020 11:54:02 +0100 Subject: [nova] If any spec freeze exception now? In-Reply-To: References: Message-ID: On Fri, 2020-07-31 at 17:23 +0800, Rambo wrote: > Hi,all: >         I have a spec which is support volume backed server rebuild[0].This spec was accepted in > Stein, but some of the work did not finish, so repropose it for Victoria.And this spec is depend on the cinder reimage > api [1], now the reimage api is almost all completed. So I sincerely wish this spec will approved in Victoria. If this > spec is approved, I will achieve it at once. > > > we have a couple of specs that are close to being appoved that currently are not i think if the remaining work can realistically be done before feature freeze that this would be a good candidate for a spec appoval exception however i did not attend the last nova meeting so i don know if a decision was made or not. normally we have a short window where such expection can be granted so i think you should add your spec to the agenda for the nova meeting next week and ask for ti to be approved. https://review.opendev.org/#/c/739349/ is the actual review correct. ill take a look at the spec in general but if i recall correct i was ok with the stein verions so if its largely unchange i would proably +1 it we have another spec proposal requesting that we suppor multiple bootable volume and allow the boot order to be set which feels like it could be better adress by a rebuild instead so i think this has a lot of value. > > > > > > > Ref: > > [0]:https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild > [1]:https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api > > > Best Regards > Rambo From sean.mcginnis at gmx.com Fri Jul 31 13:10:50 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 31 Jul 2020 08:10:50 -0500 Subject: [Glance] Proposing Dan Smith for glance core In-Reply-To: References: Message-ID: <8beebf2c-88de-5996-6da6-30401145307f@gmx.com> On 7/30/20 10:25 AM, Abhishek Kekane wrote: > Hi All, > > I'd like to propose adding Dan Smith to the glance core group. > > Dan Smith has contributed to stabilize image import workflow as well > as multiple stores of glance. > He is also contributing in tempest and nova to set up CI/tempest jobs > around image import and multiple stores. > > Being involved on the mailing-list and IRC channels, Dan is always > helpful to the community and here to help. > > Please respond with +1/-1 until 03rd August,2020 1400 UTC. > > Cheers, > Abhishek +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Fri Jul 31 15:39:05 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Fri, 31 Jul 2020 10:39:05 -0500 Subject: [Glance] Proposing Dan Smith for glance core In-Reply-To: <8beebf2c-88de-5996-6da6-30401145307f@gmx.com> References: <8beebf2c-88de-5996-6da6-30401145307f@gmx.com> Message-ID: <8635120d-11d6-136e-2581-40d3d451d1aa@gmail.com> On 7/31/2020 8:10 AM, Sean McGinnis wrote: > On 7/30/20 10:25 AM, Abhishek Kekane wrote: >> Hi All, >> >> I'd like to propose adding Dan Smith to the glance core group. >> >> Dan Smith has contributed to stabilize image import workflow as well >> as multiple stores of glance. >> He is also contributing in tempest and nova to set up CI/tempest jobs >> around image import and multiple stores. >> >> Being involved on the mailing-list and IRC channels, Dan is always >> helpful to the community and here to help. >> >> Please respond with +1/-1 until 03rd August,2020 1400 UTC. >> >> Cheers, >> Abhishek > > +1 > Not a Glance core but definitely +1 from me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Fri Jul 31 06:33:56 2020 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 31 Jul 2020 08:33:56 +0200 Subject: networking-l2gw In-Reply-To: <202007301157342887612@bonc.com.cn> References: <202007301157342887612@bonc.com.cn> Message-ID: Hi, Shall I ask if you have checked this section of the readme: https://opendev.org/openstack/networking-l2gw/src/branch/master/README.rst#user-content-getting-started There you can find a quite good introduction how to set up ovsdb on linux, if you just try out networking-l2gw. Please next time use english in your mail to openstack-discuss. To tell the truth I have not translated it to Hungarian, though that is my native language :-) So my answer is more a guess how I can help you. Regards Lajos liuzhenjie at bonc.com.cn ezt írta (időpont: 2020. júl. 30., Cs, 17:53): > 您好: > 我现在想要使用l2gw这个服务。 > 但是,我得环境是通过kolla-ansible部署的,这个服务在neutron-server的容器里起来后,在执行创建资源时报错如下: > > 我得ovsdb server是一台安装了ovs的x86服务器。 > l2gw服务起来后发现neutron-server和该x86服务器没有建立ovsdb的连接。 > > ------------------------------ > ********* 积极主动、认真工作、快乐生活 ********** > ***************************************** > * 北京东方国信科技股份有限公司 数据科学事业部-公有云研发中心 > * 姓名:刘镇杰 > * 座机:0108486-6996 > * 手机:*15689961523* > * 邮箱:liuzhenjie at bonc.com.cn > * 网址:http://www.bonc.com.cn > * 地址:北京市朝阳区创达三路1号院1号楼东方国信大厦 > ***************************************** > 系统集成部提供可靠的IT基础设施建设、运维支撑保障服务,确保业务系统稳定、高效运行,降低安全生产风险。 > 为提供便捷的集成服务支持、提升服务品质,从2018年9月1日起开通系统集成服务台作为统一服务入口。 > 服务台邮箱:xtjc at bonc.com.cn > 服务台门户:http://cloud.bonc.local > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Catch.jpg Type: image/jpeg Size: 95721 bytes Desc: not available URL: From victoria at vmartinezdelacruz.com Fri Jul 31 17:05:47 2020 From: victoria at vmartinezdelacruz.com (=?UTF-8?Q?Victoria_Mart=C3=ADnez_de_la_Cruz?=) Date: Fri, 31 Jul 2020 14:05:47 -0300 Subject: [manila] Doc-a-thon event coming up next Thursday (Aug 6th) Message-ID: Hi folks, We will be organizing a doc-a-thon next Thursday, August 6th, with the main goal of improving our docs for the next release. We will be gathering on our Freenode channel #openstack-manila after our weekly meeting (3pm UTC) and also using a videoconference tool (exact details TBC) to go over a curated list of opened doc bugs we have here [0]. *Your* participation is truly valued, being you an already Manila contributor or if you are interested in contributing and you didn't know how, so looking forward to seeing you there :) Cheers, Victoria [0] https://ethercalc.openstack.org/ur17jprbprxx -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgiusti at gmail.com Fri Jul 31 18:15:08 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Fri, 31 Jul 2020 14:15:08 -0400 Subject: [all][oslo] Announcing the retirement of devstack-plugin-pika Message-ID: Hi all, The Oslo team is announcing the retirement of the devstack-plugin-pika project [0]. This plugin was introduced as part of the testing infrastructure for a new transport driver based on the Pika client library for RabbitMQ. This driver was developed as an alternative to the existing Kombu based driver. Testing of the new driver failed to show a meaningful improvement over the existing Kombu driver so development was dropped. The Pika driver has long been removed from the Oslo.Messaging project repo. The Oslo team is unaware of any deployments making use of it. Patches to remove this project are forthcoming and will use the topic 'retire-devstack-plugin-pika'. Thanks, [0] https://opendev.org/openstack/devstack-plugin-pika [1] https://github.com/pika/pika -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rfolco at redhat.com Fri Jul 31 18:19:21 2020 From: rfolco at redhat.com (Rafael Folco) Date: Fri, 31 Jul 2020 15:19:21 -0300 Subject: [tripleo] TripleO CI Summary: Unified Sprint 30 Message-ID: Greetings, The TripleO CI team has just completed **Unified Sprint 30** (July 09 thru July 29). The following is a summary of completed work during this sprint cycle [1]: - Continued building internal component and integration pipelines for rhos-17 and rhos-16.2. - Switched promoter tests to run on Python3 and adapted molecule scenarios to the new test sequence standard. - Merged QCOW2 promotions and the new configuration engine in the promoter code. - All patches for CentOS-7 -> CentOS-8 stable/train upstream migration are up and merging / under review - https://review.opendev.org/#/q/topic:c7-to-c8-train+(status:open+OR+status:merged) - CentOS8 component and integration pipelines are done. - Improved Tempest skip list is now in production - https://opendev.org/openstack/openstack-tempest-skiplist - Design improvements to the Tempest scenario manager. - https://etherpad.opendev.org/p/tempest-scenario-manager - Fixed Libvirt bug on CI Zuul reproducer. - Ruck/Rover recorded notes [2]. - Vexxhost jobs have been turned back on for 3rd party CI, mixed results thus far w/ infrastructure stability and quality - Container-pull issues this sprint. Thanks to Alex for debugging w/ the infra team to help resolve the issue https://bugs.launchpad.net/tripleo/+bug/1889122 https://review.opendev.org/#/c/743629/ The planned work for the next sprint extends the work started in the previous sprint and focuses on downstream OSP 16.2 pipeline and on the next-gen promoter changes. The Ruck and Rover for this sprint are Marios Andreou (marios), Chandan Kumar (raukadah). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes to be tracked in hackmd [3]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-30 [2] https://hackmd.io/6Bx0FXwlRNCc75l39NSKvg [3] https://hackmd.io/QnprH9-yRTi6uWlEfaahoQ -- Folco -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgiusti at gmail.com Fri Jul 31 18:33:12 2020 From: kgiusti at gmail.com (Ken Giusti) Date: Fri, 31 Jul 2020 14:33:12 -0400 Subject: [all][oslo] Announcing the retirement of devstack-plugin-zmq Message-ID: Hi all, The Oslo team is announcing the retirement of the devstack-plugin-zmq project [0]. This plugin was introduced as part of the testing infrastructure for a new transport driver based on the ZeroMQ protocol [1]. The ZeroMQ driver has long been removed from the Oslo.Messaging project repo. The Oslo team is unaware of any deployments making use of it. Patches to remove this project are forthcoming and will use the topic 'retire-devstack-plugin-zmq'. Thanks, [0] https://opendev.org/openstack/devstack-plugin-zmq [1] https://zeromq.org/ -- Ken Giusti (kgiusti at gmail.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: From flux.adam at gmail.com Fri Jul 31 20:48:17 2020 From: flux.adam at gmail.com (Adam Harwell) Date: Fri, 31 Jul 2020 13:48:17 -0700 Subject: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers In-Reply-To: References: Message-ID: +1 congrats, now go do more reviews! :D On Thu, Jul 30, 2020, 09:25 German Eichberger wrote: > +1. Great to see some new people. Excellent work so far. > > Date: Thu, 30 Jul 2020 18:10:10 +0200 > From: Carlos Goncalves > To: Michael Johnson > Cc: openstack-discuss > Subject: Re: [octavia] Proposing Ann Taraday and Gregory Thiemonge as > Octavia core reviewers > Message-ID: > < > CAM7b86j1wr0jfzxfcCjy-yNZZ5oA+iN0z_TPF1suUCmbDqwfPg at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > +1. Excellent contributions by both of them -- thank you! > > On Thu, Jul 30, 2020 at 6:07 PM Michael Johnson > wrote: > > > Hello Octavia community, > > > > I would like to propose Ann Taraday (ataraday_) and Gregory Thiemonge > > (gthiemonge) as core reviewers on the Octavia project. > > > > Both Ann and Gregory have made significant contributions to the > > Octavia code base and have provided quality code reviews. Over the > > last two release cycles Ann has lead the addition of Taskflow jobboard > > support to the amphora v2 driver. Gregory has worked on improving our > > tempest scenario test coverage and enhancing the Octavia OpenStack > > client plugin. > > > > I think that both would make excellent additions to the Octavia core > > reviewer team. > > > > Existing Octavia core reviewers, please reply to this email with your > > support or concerns with adding Ann and Gregory to the core team. > > > > Michael > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Fri Jul 31 21:28:59 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 31 Jul 2020 14:28:59 -0700 Subject: [octavia] Proposing Ann Taraday and Gregory Thiemonge as Octavia core reviewers In-Reply-To: References: Message-ID: The e-mail thread got split, but we have a quorum of Octavia core reviewers in favor of adding Ann and Gregory as Octavia core reviewers. Congratulations! Michael On Thu, Jul 30, 2020 at 9:10 AM Carlos Goncalves wrote: > > +1. Excellent contributions by both of them -- thank you! > > On Thu, Jul 30, 2020 at 6:07 PM Michael Johnson wrote: >> >> Hello Octavia community, >> >> I would like to propose Ann Taraday (ataraday_) and Gregory Thiemonge >> (gthiemonge) as core reviewers on the Octavia project. >> >> Both Ann and Gregory have made significant contributions to the >> Octavia code base and have provided quality code reviews. Over the >> last two release cycles Ann has lead the addition of Taskflow jobboard >> support to the amphora v2 driver. Gregory has worked on improving our >> tempest scenario test coverage and enhancing the Octavia OpenStack >> client plugin. >> >> I think that both would make excellent additions to the Octavia core >> reviewer team. >> >> Existing Octavia core reviewers, please reply to this email with your >> support or concerns with adding Ann and Gregory to the core team. >> >> Michael >> From zigo at debian.org Fri Jul 31 22:11:45 2020 From: zigo at debian.org (Thomas Goirand) Date: Sat, 1 Aug 2020 00:11:45 +0200 Subject: The Open Infrastructure Summit is Going Virtual! In-Reply-To: <6aeef216-01c2-c017-5b13-b0baebfe0d92@openstack.org> References: <678d9ea6-22f0-e454-ebbf-7116501df65e@debian.org> <3a595c31-5be0-b5d0-b529-1cec1abca03a@debian.org> <28b311a0-0de8-2929-fc7b-4fc513977204@openstack.org> <4e722cd6-9ff7-ac76-03f3-61c352d96801@openstack.org> <1fd2cd70-1b9b-47d1-9236-97673247f295@debian.org> <6aeef216-01c2-c017-5b13-b0baebfe0d92@openstack.org> Message-ID: <672f7472-e561-cd1b-1643-1fe9eef7ab63@debian.org> On 7/31/20 12:19 AM, Jimmy McArthur wrote: > Thomas, > > Should be all set on this one too.  Thanks again for the report! > > Cheers, > Jimmy Thanks a lot for these 2 fixes. Thomas