From amotoki at gmail.com Wed Jul 1 02:38:51 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 1 Jul 2020 11:38:51 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Wed, Jun 24, 2020 at 10:22 PM Rodolfo Alonso Hernandez wrote: > > Hello all: > > Along this years we have increased the number of DB migrations each time we needed a new DB schema. This is good because that means the project is evolving and adding new features. > > Although this is not a problem per se, there are some inconvenients: > - Every time a system is deployed (for example in the CI using devstack), the initial DB schema is created. Then, each migration is applied sequentially. > - Some FT tests are still checking the sanity of some migrations [1] implemented a few releases ago. > - We are still testing the contract DB migrations. Of course, this is something supported before and we still need to apply those revisions. > - "TestWalkMigrationsMysql" and "TestModelsMigrationsMysql", both using MySQL backend, are still affected by LP#1687027. > > The proposal is to remove some DB migrations, starting from Liberty; of course, because all migrations must be applied in a specific order, we should begin from the initial revision, "kilo". The latest migration to be removed should be decided depending on the stable releases support. > > Apart from mitigating or solving some of the commented problems, we can "group" the DB model definition in one place. E.g.: "subnetpools" table is created in "other_extensions_init_ops". This file contains the first table. However is modified in at least two migrations: > - 1b4c6e320f79_address_scope_support_in_subnetpool: added "address_scope_id" field > - 13cfb89f881a_add_is_default_to_subnetpool: added "is_default" field > > Instead of having (at least) three places where the "subnetpools" DB schema is defined, we can remove the Mitaka migration and group this definition in just one place. > > One possible issue: some migrations add dependencies on other tables. That means the table the dependency is referring should be created in advance. That implies that, in some cases, the table creation order should be modified. That should never affect subsequent created tables or migrations. > > Do you see any inconvenience on this proposal? Am I missing something that I didn't consider? > > Thank you and regards. > > [1]https://github.com/openstack/neutron/blob/9fd60ffaac6b178de62dab169c826d52f7bfbb2d/neutron/tests/functional/db/test_migrations.py Hi, Simplification sounds good in general. Previously (up to Liberty release or some), we squashed all migrationed up to a specific past release. If you look at the git log of neutron/db/migration/alembic_migrations/versions/kilo_initial.py, you can see an example. However, it was stopped as squashing migrations needs to be done very carefully and even if we don't squash migrations the overhead of alembic migrations is not so high. You now raise this again, so it might be time to revisit it, so I am not against your proposal in general. I am not sure what you mean by "remove some DB migrations". Squashing migrations only related to some tables potentially introduces some confusion. A simpler approach looks like to merge all migrations up to a specific release (queens or rocky?). I think this approach addresses the problems you mentioned above. Thought? Akihiro From amotoki at gmail.com Wed Jul 1 02:49:20 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Wed, 1 Jul 2020 11:49:20 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: > > Hi, > Simplification sounds good (I do not take into considerations like "no code fanatic movements" or similar). > How this could affect upgrade, I am sure there are deployments older than pike, and those at a point will > got for some newer version (I hope we can give them good answers for their problems as Openstack) > > What do you think about stadium projects? As those have much less activity (as mostly solve one rather specific problem), > and much less migration scripts shall we just "merge" those to init ops? > I checked quickly a few stadium project and only bgpvpn has newer migration scripts than pike. In my understanding, squashing migrations can be done repository by repository. A revision hash of each migration is not changed and head revisions are stored in the database per repository, so it should work. For initial deployments, neutron-db-manage runs all db migrations from the initial revision to a specified revision (release), so it has no problem. For upgrade scenarios, this change just means that we just dropped support upgrade from releases included in squashed migrations. For example, if we squash migrations up to rocky (and create rocky_initial migration) in the neutron repo, we no longer support db migration from releases before rocky. This would be the only difference I see. Thanks, Akihiro > > Regards > Lajos > > Rodolfo Alonso Hernandez ezt írta (időpont: 2020. jún. 24., Sze, 15:25): >> >> Hello all: >> >> Along this years we have increased the number of DB migrations each time we needed a new DB schema. This is good because that means the project is evolving and adding new features. >> >> Although this is not a problem per se, there are some inconvenients: >> - Every time a system is deployed (for example in the CI using devstack), the initial DB schema is created. Then, each migration is applied sequentially. >> - Some FT tests are still checking the sanity of some migrations [1] implemented a few releases ago. >> - We are still testing the contract DB migrations. Of course, this is something supported before and we still need to apply those revisions. >> - "TestWalkMigrationsMysql" and "TestModelsMigrationsMysql", both using MySQL backend, are still affected by LP#1687027. >> >> The proposal is to remove some DB migrations, starting from Liberty; of course, because all migrations must be applied in a specific order, we should begin from the initial revision, "kilo". The latest migration to be removed should be decided depending on the stable releases support. >> >> Apart from mitigating or solving some of the commented problems, we can "group" the DB model definition in one place. E.g.: "subnetpools" table is created in "other_extensions_init_ops". This file contains the first table. However is modified in at least two migrations: >> - 1b4c6e320f79_address_scope_support_in_subnetpool: added "address_scope_id" field >> - 13cfb89f881a_add_is_default_to_subnetpool: added "is_default" field >> >> Instead of having (at least) three places where the "subnetpools" DB schema is defined, we can remove the Mitaka migration and group this definition in just one place. >> >> One possible issue: some migrations add dependencies on other tables. That means the table the dependency is referring should be created in advance. That implies that, in some cases, the table creation order should be modified. That should never affect subsequent created tables or migrations. >> >> Do you see any inconvenience on this proposal? Am I missing something that I didn't consider? >> >> Thank you and regards. >> >> [1]https://github.com/openstack/neutron/blob/9fd60ffaac6b178de62dab169c826d52f7bfbb2d/neutron/tests/functional/db/test_migrations.py >> From skaplons at redhat.com Wed Jul 1 07:39:17 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 1 Jul 2020 09:39:17 +0200 Subject: [neutron][drivers] Propose Rodolfo Alonso Hernandez for Neutron drivers team In-Reply-To: References: <20200623070333.kdvndgypjmuli7um@skaplons-mac> Message-ID: <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> Hi, It is already a week since I sent this nomination and I got only very positive feedback I added Rodolfo to the Neutron drivers team now. Welcome in the drivers Rodolfo and see You on our Friday’s meeting :) > On 25 Jun 2020, at 03:50, Akihiro Motoki wrote: > > +1 from me too. > It would be a great addition to the team. > > --amotoki > > On Tue, Jun 23, 2020 at 4:03 PM Slawek Kaplonski wrote: >> >> Hi, >> >> Rodolfo is very active Neutron contributor since long time. He has wide >> knowledge about all or almost all areas of the Neutron and Neutron stadium >> projects. >> He is an expert e.g. in ovs agent, pyroute and privsep module, openvswitch >> firewall, db layer, OVO and probably many others. He also has very good >> understanding about Neutron project in general, about it's design and >> direction of development. >> >> Rodolfo is also active on our drivers meetings already and I think that his >> feedback about many things there is very good and valuable for the team. >> For all these reasons I think that he will be great addition to our >> drivers team. >> >> I will keep this nomination open for a week waiting for Your feedback and >> votes. >> >> -- >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> > — Slawek Kaplonski Senior software engineer Red Hat From ralonsoh at redhat.com Wed Jul 1 07:58:06 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 1 Jul 2020 08:58:06 +0100 Subject: [neutron][drivers] Propose Rodolfo Alonso Hernandez for Neutron drivers team In-Reply-To: <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> References: <20200623070333.kdvndgypjmuli7um@skaplons-mac> <6801333A-BAD9-4DC5-B900-7AF543B81DE3@redhat.com> Message-ID: Thank you very much! I'll do my best (Yoda said "there is no try"). Regards. On Wed, Jul 1, 2020 at 8:39 AM Slawek Kaplonski wrote: > Hi, > > It is already a week since I sent this nomination and I got only very > positive feedback I added Rodolfo to the Neutron drivers team now. > Welcome in the drivers Rodolfo and see You on our Friday’s meeting :) > > > On 25 Jun 2020, at 03:50, Akihiro Motoki wrote: > > > > +1 from me too. > > It would be a great addition to the team. > > > > --amotoki > > > > On Tue, Jun 23, 2020 at 4:03 PM Slawek Kaplonski > wrote: > >> > >> Hi, > >> > >> Rodolfo is very active Neutron contributor since long time. He has wide > >> knowledge about all or almost all areas of the Neutron and Neutron > stadium > >> projects. > >> He is an expert e.g. in ovs agent, pyroute and privsep module, > openvswitch > >> firewall, db layer, OVO and probably many others. He also has very good > >> understanding about Neutron project in general, about it's design and > >> direction of development. > >> > >> Rodolfo is also active on our drivers meetings already and I think that > his > >> feedback about many things there is very good and valuable for the team. > >> For all these reasons I think that he will be great addition to our > >> drivers team. > >> > >> I will keep this nomination open for a week waiting for Your feedback > and > >> votes. > >> > >> -- > >> Slawek Kaplonski > >> Senior software engineer > >> Red Hat > >> > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig at stackhpc.com Wed Jul 1 09:57:55 2020 From: stig at stackhpc.com (Stig Telfer) Date: Wed, 1 Jul 2020 10:57:55 +0100 Subject: [scientific-sig] No IRC meeting today Message-ID: Hi All - Unfortunately I am not available to help with today’s Scientific SIG IRC meeting. However, if you haven’t done so already I recommend signing up for the OpenDev virtual event - https://www.openstack.org/events/opendev-2020/ - today it’s bare metal and edge use cases. The last two days have been very useful sessions. Cheers, Stig From ruslanas at lpic.lt Wed Jul 1 12:19:51 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 1 Jul 2020 14:19:51 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi all! Here we go, we are in the second part of this interesting troubleshooting! 1) I have LogTool setup.Thank you Arkady. 2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3] any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider. There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong? [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler wrote: > Hi all! > > I was able to analyze the attached log files and I hope that the results > may help you understand what's going wrong with instance creation. > You can find *Log_Tool's unique exported Error blocks* here: > http://paste.openstack.org/show/795356/ > > *Some statistics and problematical messages:* > ##### Statistics - Number of Errors/Warnings per Standard OSP log since: > 2020-06-30 12:30:00 ##### > Total_Number_Of_Errors --> 9 > /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 > /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 > /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 > > *nova-compute.log* > *default default] Error launching a defined domain with XML: type='kvm'>* > 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager > [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b > 69134106b56941698e58c61... > 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal > *error*: qemu unexpectedly closed the monitor: > 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... > he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set > MSR 0x48e to 0xfff9fffe04006172* > _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. > [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most > recent call last): > 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: > 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File > "/usr/lib/python3.6/site-packages/nova/vir... > > *server.log * > 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} > returned with failed status* > > *ovn_controller.log* > 272-2020-06-30T12:30:10.126079625+02:00 stderr F > 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for > network 'datacentre'* > > Thanks! > > Compute nodes are baremetal or virtualized?, I've seen similar bug reports >>>>>>> when using nested virtualization in other OSes. >>>>>>> >>>>>> baremetal. Dell R630 if to be VERY precise. >>>>> >>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>> relaunched podman container once config was changed. Either way, if I >>>>> understand Linux config correctly, the default value for user and group is >>>>> root, if commented out: >>>>> #user = "root" >>>>> #group = "root" >>>>> >>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) >>>>> and it is really not AMD CPU. >>>>> >>>>> >>>>> Just for fun, it might be important, here is how my node info looks. >>>>> ComputeS01Parameters: >>>>> NovaReservedHostMemory: 16384 >>>>> KernelArgs: "crashkernel=no rhgb" >>>>> ComputeS01ExtraConfig: >>>>> nova::cpu_allocation_ratio: 4.0 >>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>> _______________________________________________ >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbultel at redhat.com Wed Jul 1 15:05:21 2020 From: mbultel at redhat.com (Mathieu Bultel) Date: Wed, 1 Jul 2020 17:05:21 +0200 Subject: [tripleo][validations] new Validation Framework demos Message-ID: Hey TripleO, I have recorded three demos with the new Validation Framework (VF): 1st demo is similar to what Gael did few months ago but with the new code refactored (validations-libs/validations-common projects): https://asciinema.org/a/NRLULghjJa87qxRD9Nfq0FYoa 2nd demo is a use of the VF without any openstack/TripleO prerequisite, on a fresh and empty Ubuntu docker container, with only validations-libs and validations-common projects. It shows that only with a apt-get install git and python3-pip and with a basic python project installation we can run validations and use the framework: https://asciinema.org/a/2Jp9LZbN0xhJAR09zIpI6OpuB So it can answer a few demands such as: How to run validations as prep undercloud installation ? How to run validations on a non-openstack project ? What are the bare minimum requirements for being able to run Validations on a system ? May I run Validation remotely from my machine ? etc... The third one is mainly related to the deployment itself of TripleO. By using a simple PoC (https://review.opendev.org/#/c/724289/), I was able to make TripleO consuming the validations-libs framework and validation logging callback plugin. So it shows in this demo how the deploy steps playbook can be logged, parsed and shown with the VF CLI. This can be improve, modify & so on of course... it's basic usage. https://asciinema.org/a/344484 https://asciinema.org/a/344509 Mathieu. -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.mutel at gmail.com Wed Jul 1 15:34:32 2020 From: samuel.mutel at gmail.com (Samuel Mutel) Date: Wed, 1 Jul 2020 17:34:32 +0200 Subject: [CEILOMETER] Error when sending to prometheus pushgateway Message-ID: Hello, I have two questions about ceilometer (openstack version rocky). - First of all, it seems that ceilometer is sending metrics every hour and I don't understand why. - Next, I am not able to setup ceilometer to send metrics to prometheus pushgateway. Here is my configuration: > sources: > - name: meter_file > interval: 30 > meters: > - "*" > sinks: > - prometheus > > sinks: > - name: prometheus > publishers: > - prometheus://10.60.4.11:9091/metrics/job/ceilometer > Here is the error I received: > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http Traceback > (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", line 178, > in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http raise > HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http HTTPError: > 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Thanks for you help on this topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Wed Jul 1 17:26:47 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Wed, 1 Jul 2020 12:26:47 -0500 Subject: [all][summit][cfp] 2020 Open Infrastructure Summit Call for Presentations Open! Message-ID: <7DC7216B-91E8-4660-B759-788D818BCED5@openstack.org> We’re excited to announce that the Call for Presentations [1] for the 2020 Open Infrastructure Summit is now open until August 4! During the Summit, you’ll be able to join the people building and operating open infrastructure. Submit sessions featuring projects including Airship, Ansible, Ceph, Kata Containers, Kubernetes, ONAP, OpenStack, OPNFV, StarlingX and Zuul! 2020 Tracks • 5G, NFV & Edge • AI, Machine Learning & HPC • CI/CD • Container Infrastructure • Getting Started • Hands-on Workshops • Open Development • Private & Hybrid Cloud • Public Cloud • Security Types of sessions Presentations; demos encouraged Panel Discussions Lightning Talks If your talk is not selected for the official track schedule, we may reach out to have you present it as a Lightning Talk during the Summit in a shorter 10-15 minute format SUBMIT YOUR PRESENTATION [1] - Deadline August 4, 2020 Summit CFP is only for presentation, panel, and workshop submissions. The content submission process for the Forum and Project Teams Gathering (PTG) will be managed separately in the upcoming months. Programming Committee nominations are also now open. The Programming Committee helps select sessions from the CFP for the Summit schedule. Nominate yourself or or someone else for the Programming Committee [2] before July 10, 2020 and help us program the Summit! Registration and sponsorship coming soon! For sponsorship inquiries, please email kendall at openstack.org . Please email speakersupport at openstack.org with any CFP questions or feedback. Thanks, Ashlee [1] cfp.openstack.org [2] https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 Ashlee Ferguson Community & Events Coordinator OpenStack Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Wed Jul 1 19:24:42 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Wed, 1 Jul 2020 15:24:42 -0400 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' Message-ID: Hi all, A short question. I saw a few projects are using the name 'ca_file' [1] as config option, while others are using 'cafile' [2]. I wonder what is the flavorite name convention? I asked this question because Kolla developer suggested Zun to rename from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to confirm if this is a good idea from Keystone's perspective. Thanks. Best regards, Hongbin [1] http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= [2] http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= [3] https://review.opendev.org/#/c/738329/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Wed Jul 1 20:28:45 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 1 Jul 2020 15:28:45 -0500 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: On 7/1/20 2:24 PM, Hongbin Lu wrote: > Hi all, > > A short question. I saw a few projects are using the name 'ca_file' > [1] as config option, while others are using 'cafile' [2]. I wonder > what is the flavorite name convention? > > I asked this question because Kolla developer suggested Zun to rename > from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to > confirm if this is a good idea from Keystone's perspective. Thanks. > > Best regards, > Hongbin > > [1] > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= > [2] > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= > [3] https://review.opendev.org/#/c/738329/ Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, and registry_client_ca_file). From keystone_auth, we do also have cafile. Personally, I find the separation of ca_file to be much easier to read. Sean From whayutin at redhat.com Wed Jul 1 20:49:22 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 14:49:22 -0600 Subject: [tripleo][validations] new Validation Framework demos In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 9:07 AM Mathieu Bultel wrote: > Hey TripleO, > > I have recorded three demos with the new Validation Framework (VF): > 1st demo is similar to what Gael did few months ago but with the new code > refactored (validations-libs/validations-common projects): > https://asciinema.org/a/NRLULghjJa87qxRD9Nfq0FYoa > > 2nd demo is a use of the VF without any openstack/TripleO prerequisite, > on a fresh and empty Ubuntu docker container, with only validations-libs > and validations-common projects. > It shows that only with a apt-get install git and python3-pip and with a > basic python project installation we can run validations and use the > framework: > https://asciinema.org/a/2Jp9LZbN0xhJAR09zIpI6OpuB > > So it can answer a few demands such as: > How to run validations as prep undercloud installation ? > How to run validations on a non-openstack project ? > What are the bare minimum requirements for being able to run > Validations on a system ? May I run Validation remotely from my > machine ? etc... > > The third one is mainly related to the deployment itself of TripleO. > By using a simple PoC (https://review.opendev.org/#/c/724289/), I was > able to make TripleO consuming the validations-libs framework and > validation logging callback plugin. > So it shows in this demo how the deploy steps playbook can be logged, > parsed and shown with the VF CLI. This can be improve, modify & so on of > course... it's basic usage. > https://asciinema.org/a/344484 > https://asciinema.org/a/344509 > > Mathieu. > > Thanks for posting these Mathieu! This helps to visualize some of the topics discussed at the PTG. I like a lot of what I see here and I can see the value it will bring. I have some minor questions about the format of the logs.. like each task has TIMING in bold. Silly stuff like that. Looking forward to looking at this more in depth. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Jul 1 21:45:11 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 1 Jul 2020 17:45:11 -0400 Subject: [cinder] spec freeze now in effect Message-ID: The Cinder spec freeze is now in effect. Here is the rundown on how the specs that have not yet been accepted for Victoria stand: The following specs have a spec freeze exception. Specs must be merged by 1600 UTC on 10 July. (Ideally, you'll have your revisions completed before next week's Cinder meeting on 8 July so we can discuss any issues at the meeting and give you time in case you need to make a final revision.) Remove quota usage cache https://review.opendev.org/#/c/730701/ - need to address some comments on the spec Support modern compression algorithms in cinder backup https://review.opendev.org/#/c/726307/ - needs a requirements change analysis; see comments on the review Reset state robustification https://review.opendev.org/#/c/682456/ - just needs to make moving the "force" option to cinder-manage explicit Default volume type overrides https://review.opendev.org/#/c/733555/ - need some tiem to work out the REST API change more carefully The following spec has been rejected for Victoria, but the team is OK with this being clarified (suggestions are in the comments on the review) and proposed for Wallaby: Support revert any snapshot to the volume https://review.opendev.org/#/c/736111/ The following spec has been rejected for Victoria, but because it's really a bug: volume list query optimization https://review.opendev.org/#/c/726070/ - it's been turned into https://bugs.launchpad.net/cinder/+bug/1885961 and the proposer can submit patches that address the bug. cheers, brian From openstack at nemebean.com Wed Jul 1 21:58:08 2020 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 1 Jul 2020 16:58:08 -0500 Subject: [oslo] PTO on Monday Message-ID: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> Hi Oslo, I'm making this a four day weekend (Friday is a US holiday), so I won't be around for the meeting on Monday. If someone else wants to run it then feel free to hold it without me. Otherwise we'll return to the regular schedule the following week. -Ben From whayutin at redhat.com Thu Jul 2 01:18:10 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 19:18:10 -0600 Subject: [tripleo][ci] status RED In-Reply-To: References: Message-ID: On Mon, Jun 29, 2020 at 10:41 AM Wesley Hayutin wrote: > Greetings, > > Unfortunately both check and gate are RED atm due to [1]. The issue w/ > CirrOS-5.1 was fixed / reverted over the weekend [2]. I expect the check > and gate jobs to continue to be RED for the next few days as the > investigation proceeds. > > I would encourage folks to only workflow patches that are critical as the > chances you will actually merge anything is not great. > > > [1] https://bugs.launchpad.net/tripleo/+bug/1885286 > [2] https://review.opendev.org/#/c/738025/ > OK.. We have finally got our hands on an upstream CentOS-8 node and found the issue w/ retry_attempts. There is an issue w/ CentOS-8, OVS, and os-net-config. We have mitigated the issue by ensuring NetworkManager is disabled before the TripleO install bits start. Still working the issue but I think we're back to green. Thanks Alex and Sagi!!! FYI: https://bugs.launchpad.net/tripleo/+bug/1885286 I'm updating the topic in #tripleo now :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Thu Jul 2 01:21:15 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 1 Jul 2020 19:21:15 -0600 Subject: [tripleo][ci] status RED In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 7:18 PM Wesley Hayutin wrote: > > > On Mon, Jun 29, 2020 at 10:41 AM Wesley Hayutin > wrote: > >> Greetings, >> >> Unfortunately both check and gate are RED atm due to [1]. The issue w/ >> CirrOS-5.1 was fixed / reverted over the weekend [2]. I expect the check >> and gate jobs to continue to be RED for the next few days as the >> investigation proceeds. >> >> I would encourage folks to only workflow patches that are critical as the >> chances you will actually merge anything is not great. >> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1885286 >> [2] https://review.opendev.org/#/c/738025/ >> > > OK.. > We have finally got our hands on an upstream CentOS-8 node and found the > issue w/ retry_attempts. There is an issue w/ CentOS-8, OVS, and > os-net-config. We have mitigated the issue by ensuring NetworkManager is > disabled before the TripleO install bits start. > > Still working the issue but I think we're back to green. Thanks Alex and > Sagi!!! > Also big thanks to the upstream infra folks, clark, fungi and others, for all the debug and extra time they spent w/ the tripleo team!! Much appreciated :) > > FYI: https://bugs.launchpad.net/tripleo/+bug/1885286 > > I'm updating the topic in #tripleo now > > :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 2 07:23:39 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 2 Jul 2020 09:23:39 +0200 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: On Wed, Jul 1, 2020 at 10:31 PM Sean McGinnis wrote: > > On 7/1/20 2:24 PM, Hongbin Lu wrote: > > Hi all, > > > > A short question. I saw a few projects are using the name 'ca_file' > > [1] as config option, while others are using 'cafile' [2]. I wonder > > what is the flavorite name convention? > > > > I asked this question because Kolla developer suggested Zun to rename > > from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to > > confirm if this is a good idea from Keystone's perspective. Thanks. > > > > Best regards, > > Hongbin > > > > [1] > > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= > > [2] > > http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= > > [3] https://review.opendev.org/#/c/738329/ > > Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, > and registry_client_ca_file). > From keystone_auth, we do also have cafile. > > Personally, I find the separation of ca_file to be much easier to read. > > Sean > > Yeah, it was me to suggest the aliasing. We found that the 'cafile' seems more prevalent. We missed that underscore for Zun and scratched our heads "what are we doing wrong there?". Nova has its most interesting because it uses cafile for clients but ca_file for hypervisors 🤷 -yoctozepto From info at dantalion.nl Thu Jul 2 08:37:35 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Thu, 2 Jul 2020 10:37:35 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated Message-ID: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> Hello, the images on docker.io have last been updated 9 months ago https://hub.docker.com/u/loci, I was wondering when do they get updated? As I am currently waiting for the image for Watcher to be available, this image has recently been added as gate job. I require this image in order for the OpenStack helm charts to test https://review.opendev.org/#/c/720140/. Kind regards, Corne lukken From paye600 at gmail.com Thu Jul 2 10:35:59 2020 From: paye600 at gmail.com (Roman Gorshunov) Date: Thu, 2 Jul 2020 12:35:59 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> References: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> Message-ID: <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> Hello Corne, Thank you for your email. i have investigated the issue, and seems that we have image push broken for some time. While we work on resolution, I could advice you to locally build images, if that suits you. I would post a reply here to the mailing list once issue is resolved. Again, thank you for paying attention and informing us. Best regards, Roman Gorshunov From sathlang at redhat.com Thu Jul 2 11:20:48 2020 From: sathlang at redhat.com (Sofer Athlan-Guyot) Date: Thu, 02 Jul 2020 13:20:48 +0200 Subject: [tripleo][update][blueprint] Update refactor: more feedback, more control, more speed. Message-ID: <875zb6noxb.fsf@s390sx.i-did-not-set--mail-host-address--so-tickle-me> Hi, hope you liked the title, I find it catchy. Update is mainly an afterthought that needs to work. So we mainly fix "stuff" there. No major change happened there since a long time. Following the PTG, I'm proposing a new blueprint and a bug: 1. Refactor tripleo update to offer the user more feedback and control[1]. 2. Registering node and repos can happen after some module check for packages[2]. I'm pretty new to this so I would need feedback about the form and content. For instance, point 2. could be a blueprint instead of a bug, tell me what you think. 1. refactor update step to load step playbook instead of looping over the steps: - this will speed up update (no more skipped tasks) - this will offer point of recovery when the update fails (by doing something like in named debug[3] for deployment) 2. refactor/fix? host-prep-tasks to include two steps: - step0 to add pre-update in-flight validation to the update process and rhosp registration; - step1 to all other tasks; - make sure it run in parallel on all nodes Point 1. would be a catch up with deployment. It offers speed improvement as we wouldn't skip tasks anymore. We could notify the user of what we are doing: "I'm removing the node from the cluster" instead of "step1". It would offer the user the hook to be able to restart a failed update from any step. Overall a big win, I think. Point 2. is newer, I filled it as a bug because I bumped into it as an issue when trying to add validation for subscription. It opens some possibilities for the update: - in-flight validation at the beginning of the update process that would be skipped during deployment using tag - using tags we could also run specific day 2 action outside of the update window: openstack overcloud update run --tags 'pre-update-validation' (with pre-update-validation in host-prep-tasks step0) openstack overcloud update run --tags 'rhsm-subscription' Well, it looked promising to me. Now, tell me what you think, but please, be nice, I'm old and susceptible. I have more coming, sorted by order of though I put into it, starting with the ones I though about more: - Check if we need a reboot of the server and notify the user. - Gain some more speed and clarity by having a running-on-all-host-in-parallel-host-update-prep-tasks new step. For instance all HA image tagging magic could go in there. - Investigate converge and check if we still could not further optimize it for update. I would like to gain more experience with the process before I filled those new blueprints. I'm going to draft a spec for the proposed blueprint and then I'll push some WIP code. Thanks, [1] https://blueprints.launchpad.net/tripleo/+spec/tripleo-update-smart-steps [2] https://bugs.launchpad.net/tripleo/+bug/1886028 [1] https://review.opendev.org/#/c/636731/ -- Sofer Athlan-Guyot chem on #irc DFG:Upgrades From ionut at fleio.com Thu Jul 2 12:42:36 2020 From: ionut at fleio.com (Ionut Biru) Date: Thu, 2 Jul 2020 15:42:36 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hello Rafael, Since the merging window for ussuri was long passed for those commits, is it safe to assume that it will not land in stable/ussuri at all and those will be available for victoria? How safe is to cherry pick those commits and use them in production? On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > The dynamic pollster in Ceilometer will be first released in Ussuri. > However, there are some important PRs still waiting for a merge, that might > be important for your use case: > * https://review.opendev.org/#/c/722092/ > * https://review.opendev.org/#/c/715180/ > * https://review.opendev.org/#/c/715289/ > * https://review.opendev.org/#/c/679999/ > * https://review.opendev.org/#/c/709807/ > > > On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves > wrote: > >> >> >> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >> >>> Hello, >>> >>> I want to meter the loadbalancer into gnocchi for billing purposes in >>> stein/train and ceilometer doesn't support dynamic pollsters. >>> >> >> I think I misunderstood your use case, sorry. I read it as if you wanted >> to know "if a loadbalancer was deployed and has status active". >> >> >>> Until I upgrade to Ussuri, is there a way to accomplish this? >>> >> >> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >> Ceilometer project. >> >> >>> >>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves >>> wrote: >>> >>>> Hi Ionut, >>>> >>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>> >>>>> Hello guys, >>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>> following: >>>>> - network.services.lb.active.connections >>>>> - network.services.lb.health_monitor >>>>> - network.services.lb.incoming.bytes >>>>> - network.services.lb.listener >>>>> - network.services.lb.loadbalancer >>>>> - network.services.lb.member >>>>> - network.services.lb.outgoing.bytes >>>>> - network.services.lb.pool >>>>> - network.services.lb.total.connections >>>>> >>>>> But it doesn't work, I think they are for the old lbs that were >>>>> supported in neutron. >>>>> >>>>> I found >>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>> but this is not available in stein or train. >>>>> >>>>> I was wondering if there is a way to meter loadbalancers from octavia. >>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>> and has status active. >>>>> >>>> >>>> You can get the provisioning and operating status of Octavia load >>>> balancers via the Octavia API. There is also an API endpoint that returns >>>> the full load balancer status tree [1]. Additionally, Octavia has >>>> three API endpoints for statistics [2][3][4]. >>>> >>>> I hope this helps with your use case. >>>> >>>> Cheers, >>>> Carlos >>>> >>>> [1] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>> [2] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>> [3] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>> [4] >>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>> >>>> >>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From opensrloo at gmail.com Thu Jul 2 13:37:00 2020 From: opensrloo at gmail.com (Ruby Loo) Date: Thu, 2 Jul 2020 09:37:00 -0400 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: Hi, On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki wrote: > On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: > > > > Hi, > > Simplification sounds good (I do not take into considerations like "no > code fanatic movements" or similar). > > How this could affect upgrade, I am sure there are deployments older > than pike, and those at a point will > > got for some newer version (I hope we can give them good answers for > their problems as Openstack) > > > > What do you think about stadium projects? As those have much less > activity (as mostly solve one rather specific problem), > > and much less migration scripts shall we just "merge" those to init ops? > > I checked quickly a few stadium project and only bgpvpn has newer > migration scripts than pike. > > In my understanding, squashing migrations can be done repository by > repository. > A revision hash of each migration is not changed and head revisions > are stored in the database per repository, so it should work. > For initial deployments, neutron-db-manage runs all db migrations from > the initial revision to a specified revision (release), so it has no > problem. > For upgrade scenarios, this change just means that we just dropped > support upgrade from releases included in squashed migrations. > For example, if we squash migrations up to rocky (and create > rocky_initial migration) in the neutron repo, we no longer support db > migration from releases before rocky. This would be the only > difference I see. > I wonder if this is acceptable (that an OpenStack service will not support db migrations prior to rocky). What is (or is there?) OpenStack's stance wrt support for upgrades? We are using ocata and plan on upgrading but we don't know when that might happen :-( --ruby -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Thu Jul 2 13:47:42 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Thu, 2 Jul 2020 14:47:42 +0100 Subject: [all][stable] Moving the stable/ocata to 'Unmaintained' phase and then EOL In-Reply-To: <17260946293.bf44dcaa81001.6800161932877911216@ghanshyammann.com> References: <1725c3cbbd0.11d04fc8645393.9035729090460383424@ghanshyammann.com> <762e58c8-44f6-79d0-d674-43becf3eb42a@gmx.com> <17260946293.bf44dcaa81001.6800161932877911216@ghanshyammann.com> Message-ID: <20200702134742.ux3qaqotc4xlgbku@lyarwood.usersys.redhat.com> On 29-05-20 08:17:16, Ghanshyam Mann wrote: > ---- On Fri, 29 May 2020 07:54:05 -0500 Sean McGinnis wrote ---- > > On 5/29/20 6:34 AM, Előd Illés wrote: > > > [snip] > > > > > > TL;DR: If it's not feasible to fix a general issue of a job, then drop > > > that job. And I think we should not EOL Ocata in general, rather let > > > projects EOL their ocata branch if they cannot invest more time on > > > fixing them. > > > > The interdependency is the trick here. Some projects can easily EOL on > > their own and it's isolated enough that it doesn't cause issues. But for > > other projects, like Cinder and Nova that I mentioned, it's kind of an > > all-or-nothing situation. > > > > I suppose it is feasible that we drop testing to only running unit > > tests. If we don't run any kind of integration testing, then it does > > make these projects a little more independent. > > > > We still have the requirements issues though. Unless someone addresses > > any rot in the stable requirements, even unit tests become hard to run. > > > > Just thinking out loud on some of the issues I see. We can try to follow > > the original EM plan and leave it up to each project to declare their > > intent to go EOL, then tag ocata-eol to close it out. Or we can > > collectively decide Ocata is done and pull the big switch. > > From the stable policy if CI has broken nd no maintainer then we can move that > to unmaintained. And there is always time to revert back to EM if the maintainer shows up. > > IMO, maintaining only with unit tests is not a good idea. > > I have not heard from projects that they are interested to maintain it, if any then we can see > how to proceed otherwise collectively marking Ocata as Unmaintained is the right thing. Yup agreed, I'm going to be proposing that we move stable/ocata to unmaintained for openstack/nova at least FWIW, we haven't seen anything of value land there in the last three months: https://review.opendev.org/#/q/project:openstack/nova+branch:stable/ocata Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From ruslanas at lpic.lt Thu Jul 2 13:56:05 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 15:56:05 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi All, I have one idea, why it might be the issue. during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages.... How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos.... $ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack at ironic-poc ~]$ can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one? On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: > Hi all! > > Here we go, we are in the second part of this interesting troubleshooting! > > 1) I have LogTool setup.Thank you Arkady. > > 2) I have user OSP to create instance, and I have used virsh to create > instance. > 2.1) OSP way is failing in either way, if it is volume-based or > image-based, it is failing either way.. [1] and [2] > 2.2) when I create it using CLI: [0] [3] > > any ideas what can be wrong? What options I should choose? > I have one network/vlan for whole cloud. I am doing proof of concept of > remote booting, so I do not have br-ex setup. and I do not have br-provider. > > There is my compute[5] and controller[6] yaml files, Please help, how it > should look like so it would have br-ex and br-int connected? as br-int now > is in UNKNOWN state. And br-ex do not exist. > As I understand, in roles data yaml, when we have tag external it should > create br-ex? or am I wrong? > > [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. > [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs > [2] http://paste.openstack.org/show/795431/ < controller logs > [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ > [4] http://paste.openstack.org/show/795433/ < xml file for > [5] > https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml > [6] > https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml > > > On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler > wrote: > >> Hi all! >> >> I was able to analyze the attached log files and I hope that the results >> may help you understand what's going wrong with instance creation. >> You can find *Log_Tool's unique exported Error blocks* here: >> http://paste.openstack.org/show/795356/ >> >> *Some statistics and problematical messages:* >> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: >> 2020-06-30 12:30:00 ##### >> Total_Number_Of_Errors --> 9 >> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >> >> *nova-compute.log* >> *default default] Error launching a defined domain with XML: > type='kvm'>* >> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >> 69134106b56941698e58c61... >> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >> *error*: qemu unexpectedly closed the monitor: >> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set >> MSR 0x48e to 0xfff9fffe04006172* >> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >> recent call last): >> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >> "/usr/lib/python3.6/site-packages/nova/vir... >> >> *server.log * >> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} >> returned with failed status* >> >> *ovn_controller.log* >> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >> network 'datacentre'* >> >> Thanks! >> >> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>> >>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>> >>>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>>> relaunched podman container once config was changed. Either way, if I >>>>>> understand Linux config correctly, the default value for user and group is >>>>>> root, if commented out: >>>>>> #user = "root" >>>>>> #group = "root" >>>>>> >>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) >>>>>> and it is really not AMD CPU. >>>>>> >>>>>> >>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>> ComputeS01Parameters: >>>>>> NovaReservedHostMemory: 16384 >>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>> ComputeS01ExtraConfig: >>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>> _______________________________________________ >>>>>> >>>>>> > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 2 13:58:46 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 15:58:46 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: by the way in CentOS8, here is an error message I receive when searching around [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack at rdo-u ~]$ On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: > Hi All, > > I have one idea, why it might be the issue. > > during image creation step, I have hadd missing packets: > pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs > PCS thing can be found in HA repo, so I enabled it, but > "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... > > I believe that is a case... > so it installed non CentOS8 maintained kvm or some dependent packages.... > > How can I get osops-tools-monitoring-oschecks from centos repos? it is > last seen in CentOS7 repos.... > > $ yum list --enablerepo=* --disablerepo "c7-media" | grep > osops-tools-monitoring-oschecks -A2 > osops-tools-monitoring-oschecks.noarch > 0.0.1-0.20191202171903.bafe3f0.el7 > > rdo-trunk-train-tested > ostree-debuginfo.x86_64 2019.1-2.el7 > base-debuginfo > (undercloud) [stack at ironic-poc ~]$ > > can I somehow not include that package in image creation? OR if it is > essential, can I create a different repo for that one? > > > > > On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: > >> Hi all! >> >> Here we go, we are in the second part of this interesting troubleshooting! >> >> 1) I have LogTool setup.Thank you Arkady. >> >> 2) I have user OSP to create instance, and I have used virsh to create >> instance. >> 2.1) OSP way is failing in either way, if it is volume-based or >> image-based, it is failing either way.. [1] and [2] >> 2.2) when I create it using CLI: [0] [3] >> >> any ideas what can be wrong? What options I should choose? >> I have one network/vlan for whole cloud. I am doing proof of concept of >> remote booting, so I do not have br-ex setup. and I do not have br-provider. >> >> There is my compute[5] and controller[6] yaml files, Please help, how it >> should look like so it would have br-ex and br-int connected? as br-int now >> is in UNKNOWN state. And br-ex do not exist. >> As I understand, in roles data yaml, when we have tag external it should >> create br-ex? or am I wrong? >> >> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. >> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >> [2] http://paste.openstack.org/show/795431/ < controller logs >> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >> [4] http://paste.openstack.org/show/795433/ < xml file for >> [5] >> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >> [6] >> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >> >> >> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >> wrote: >> >>> Hi all! >>> >>> I was able to analyze the attached log files and I hope that the results >>> may help you understand what's going wrong with instance creation. >>> You can find *Log_Tool's unique exported Error blocks* here: >>> http://paste.openstack.org/show/795356/ >>> >>> *Some statistics and problematical messages:* >>> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: >>> 2020-06-30 12:30:00 ##### >>> Total_Number_Of_Errors --> 9 >>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>> >>> *nova-compute.log* >>> *default default] Error launching a defined domain with XML: >> type='kvm'>* >>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>> 69134106b56941698e58c61... >>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>> *error*: qemu unexpectedly closed the monitor: >>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set >>> MSR 0x48e to 0xfff9fffe04006172* >>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>> recent call last): >>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>> "/usr/lib/python3.6/site-packages/nova/vir... >>> >>> *server.log * >>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>> 422} returned with failed status* >>> >>> *ovn_controller.log* >>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>> network 'datacentre'* >>> >>> Thanks! >>> >>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>> >>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>> >>>>>>> Thank you, I will try. I also modified a file, and it looked like it >>>>>>> relaunched podman container once config was changed. Either way, if I >>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>> root, if commented out: >>>>>>> #user = "root" >>>>>>> #group = "root" >>>>>>> >>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>> :) and it is really not AMD CPU. >>>>>>> >>>>>>> >>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>> ComputeS01Parameters: >>>>>>> NovaReservedHostMemory: 16384 >>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>> ComputeS01ExtraConfig: >>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>> _______________________________________________ >>>>>>> >>>>>>> >> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyarwood at redhat.com Thu Jul 2 14:05:28 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Thu, 2 Jul 2020 15:05:28 +0100 Subject: [nova][stable] The openstack/nova stable/ocata branch is currently unmaintained Message-ID: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> Hello all, A quick note to highlight that the stable/ocata branch of openstack/nova [1] is formally in the ``Unmaintained`` [2] phase of maintenance will be moved on to the final ``EOL`` phase after a total of 6 months of inactivity. I'm going to suggest that we ignore the following change as this only attempted to remove a job from the experimental queue and doesn't constitute actual maintenance of the branch IMHO. Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfs https://review.opendev.org/#/c/714958/ As a result I consider the branch to have been inactive for 3 of the required 6 months before it can be marked as ``EOL`` [3]. Volunteers are welcome to step forward and attempt to move the branch back to the ``Extended Maintenance`` phase by proposing changes and fixing CI in the next 3 months, otherwise the branch will be marked as ``EOL``. Hopefully this isn't taking anyone by surprise but please let me know if this is going to be an issue! Regards, [1] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/ocata [2] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained [3] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From aschultz at redhat.com Thu Jul 2 14:07:46 2020 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 2 Jul 2020 08:07:46 -0600 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: current-passed-ci is not a valid repo. https://trunk.rdoproject.org/centos8-ussuri/ How are you configuring these repos? On Thu, Jul 2, 2020 at 7:59 AM Ruslanas Gžibovskis wrote: > > by the way in CentOS8, here is an error message I receive when searching around > > [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks > Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': > - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml (IP: 3.87.151.16) > Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried > [stack at rdo-u ~]$ > > On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: >> >> Hi All, >> >> I have one idea, why it might be the issue. >> >> during image creation step, I have hadd missing packets: >> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >> PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >> >> I believe that is a case... >> so it installed non CentOS8 maintained kvm or some dependent packages.... >> >> How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos.... >> >> $ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 >> osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 >> rdo-trunk-train-tested >> ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo >> (undercloud) [stack at ironic-poc ~]$ >> >> can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one? >> >> >> >> >> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis wrote: >>> >>> Hi all! >>> >>> Here we go, we are in the second part of this interesting troubleshooting! >>> >>> 1) I have LogTool setup.Thank you Arkady. >>> >>> 2) I have user OSP to create instance, and I have used virsh to create instance. >>> 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] >>> 2.2) when I create it using CLI: [0] [3] >>> >>> any ideas what can be wrong? What options I should choose? >>> I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider. >>> >>> There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. >>> As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong? >>> >>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. >>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >>> [2] http://paste.openstack.org/show/795431/ < controller logs >>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>> [4] http://paste.openstack.org/show/795433/ < xml file for >>> [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>> [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>> >>> >>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler wrote: >>>> >>>> Hi all! >>>> >>>> I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. >>>> You can find Log_Tool's unique exported Error blocks here: http://paste.openstack.org/show/795356/ >>>> >>>> Some statistics and problematical messages: >>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### >>>> Total_Number_Of_Errors --> 9 >>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>> >>>> nova-compute.log >>>> default default] Error launching a defined domain with XML: >>>> 368-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... >>>> 70dc50f] Instance failed to spawn: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0... >>>> he monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 >>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. >>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] Traceback (most recent call last): >>>> 375-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir... >>>> >>>> server.log >>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', 'code': 422} returned with failed status >>>> >>>> ovn_controller.log >>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|Bridge 'br-ex' not found for network 'datacentre' >>>> >>>> Thanks! >>>> >>>>>>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug reports when using nested virtualization in other OSes. >>>>>>>> >>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>> >>>>>>>> Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: >>>>>>>> #user = "root" >>>>>>>> #group = "root" >>>>>>>> >>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU. >>>>>>>> >>>>>>>> >>>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>>> ComputeS01Parameters: >>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>> ComputeS01ExtraConfig: >>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>> _______________________________________________ >>>>>>>> >>> >> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 > > > > -- > Ruslanas Gžibovskis > +370 6030 7030 > _______________________________________________ > users mailing list > users at lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscribe at lists.rdoproject.org From amoralej at redhat.com Thu Jul 2 14:17:56 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 16:17:56 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis wrote: > by the way in CentOS8, here is an error message I receive when searching > around > > [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo > "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks > Errors during downloading metadata for repository > 'rdo-trunk-ussuri-tested': > - Status code: 403 for > https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml > (IP: 3.87.151.16) > Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': > Cannot download repomd.xml: Cannot download repodata/repomd.xml: All > mirrors were tried > [stack at rdo-u ~]$ > > Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it. Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8: https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def > On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis wrote: > >> Hi All, >> >> I have one idea, why it might be the issue. >> >> during image creation step, I have hadd missing packets: >> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >> PCS thing can be found in HA repo, so I enabled it, but >> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >> >> I believe that is a case... >> so it installed non CentOS8 maintained kvm or some dependent packages.... >> >> How can I get osops-tools-monitoring-oschecks from centos repos? it is >> last seen in CentOS7 repos.... >> >> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >> osops-tools-monitoring-oschecks -A2 >> osops-tools-monitoring-oschecks.noarch >> 0.0.1-0.20191202171903.bafe3f0.el7 >> >> rdo-trunk-train-tested >> ostree-debuginfo.x86_64 2019.1-2.el7 >> base-debuginfo >> (undercloud) [stack at ironic-poc ~]$ >> >> can I somehow not include that package in image creation? OR if it is >> essential, can I create a different repo for that one? >> >> >> >> >> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >> wrote: >> >>> Hi all! >>> >>> Here we go, we are in the second part of this interesting >>> troubleshooting! >>> >>> 1) I have LogTool setup.Thank you Arkady. >>> >>> 2) I have user OSP to create instance, and I have used virsh to create >>> instance. >>> 2.1) OSP way is failing in either way, if it is volume-based or >>> image-based, it is failing either way.. [1] and [2] >>> 2.2) when I create it using CLI: [0] [3] >>> >>> any ideas what can be wrong? What options I should choose? >>> I have one network/vlan for whole cloud. I am doing proof of concept of >>> remote booting, so I do not have br-ex setup. and I do not have br-provider. >>> >>> There is my compute[5] and controller[6] yaml files, Please help, how it >>> should look like so it would have br-ex and br-int connected? as br-int now >>> is in UNKNOWN state. And br-ex do not exist. >>> As I understand, in roles data yaml, when we have tag external it should >>> create br-ex? or am I wrong? >>> >>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>> running. >>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs >>> [2] http://paste.openstack.org/show/795431/ < controller logs >>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>> [4] http://paste.openstack.org/show/795433/ < xml file for >>> [5] >>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>> [6] >>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>> >>> >>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>> wrote: >>> >>>> Hi all! >>>> >>>> I was able to analyze the attached log files and I hope that the >>>> results may help you understand what's going wrong with instance creation. >>>> You can find *Log_Tool's unique exported Error blocks* here: >>>> http://paste.openstack.org/show/795356/ >>>> >>>> *Some statistics and problematical messages:* >>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>> since: 2020-06-30 12:30:00 ##### >>>> Total_Number_Of_Errors --> 9 >>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>> >>>> *nova-compute.log* >>>> *default default] Error launching a defined domain with XML: >>> type='kvm'>* >>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>> 69134106b56941698e58c61... >>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>>> *error*: qemu unexpectedly closed the monitor: >>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>> set MSR 0x48e to 0xfff9fffe04006172* >>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>> recent call last): >>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>> >>>> *server.log * >>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>> 422} returned with failed status* >>>> >>>> *ovn_controller.log* >>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>> network 'datacentre'* >>>> >>>> Thanks! >>>> >>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>> >>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>> >>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>> root, if commented out: >>>>>>>> #user = "root" >>>>>>>> #group = "root" >>>>>>>> >>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>>> :) and it is really not AMD CPU. >>>>>>>> >>>>>>>> >>>>>>>> Just for fun, it might be important, here is how my node info looks. >>>>>>>> ComputeS01Parameters: >>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>> ComputeS01ExtraConfig: >>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>>> >>> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> > > > -- > Ruslanas Gžibovskis > +370 6030 7030 > _______________________________________________ > users mailing list > users at lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscribe at lists.rdoproject.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 2 14:38:04 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 2 Jul 2020 17:38:04 +0300 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be? and your question, "how it can impact kvm": in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong. On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, wrote: > > > On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis > wrote: > >> by the way in CentOS8, here is an error message I receive when searching >> around >> >> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >> Errors during downloading metadata for repository >> 'rdo-trunk-ussuri-tested': >> - Status code: 403 for >> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >> (IP: 3.87.151.16) >> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >> mirrors were tried >> [stack at rdo-u ~]$ >> >> > Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled > by default and not longer usable (i'll send a patch to retire it), don't > enable it. > > Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to > install CentOS8 maintained kvm. BTW, i think that package should not be > required in CentOS8: > > > https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def > > > > >> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >> wrote: >> >>> Hi All, >>> >>> I have one idea, why it might be the issue. >>> >>> during image creation step, I have hadd missing packets: >>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>> PCS thing can be found in HA repo, so I enabled it, but >>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>> >>> I believe that is a case... >>> so it installed non CentOS8 maintained kvm or some dependent packages.... >>> >>> How can I get osops-tools-monitoring-oschecks from centos repos? it is >>> last seen in CentOS7 repos.... >>> >>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>> osops-tools-monitoring-oschecks -A2 >>> osops-tools-monitoring-oschecks.noarch >>> 0.0.1-0.20191202171903.bafe3f0.el7 >>> >>> rdo-trunk-train-tested >>> ostree-debuginfo.x86_64 2019.1-2.el7 >>> base-debuginfo >>> (undercloud) [stack at ironic-poc ~]$ >>> >>> can I somehow not include that package in image creation? OR if it is >>> essential, can I create a different repo for that one? >>> >>> >>> >>> >>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>> wrote: >>> >>>> Hi all! >>>> >>>> Here we go, we are in the second part of this interesting >>>> troubleshooting! >>>> >>>> 1) I have LogTool setup.Thank you Arkady. >>>> >>>> 2) I have user OSP to create instance, and I have used virsh to create >>>> instance. >>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>> image-based, it is failing either way.. [1] and [2] >>>> 2.2) when I create it using CLI: [0] [3] >>>> >>>> any ideas what can be wrong? What options I should choose? >>>> I have one network/vlan for whole cloud. I am doing proof of concept of >>>> remote booting, so I do not have br-ex setup. and I do not have br-provider. >>>> >>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>> it should look like so it would have br-ex and br-int connected? as >>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>> As I understand, in roles data yaml, when we have tag external it >>>> should create br-ex? or am I wrong? >>>> >>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>> running. >>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>> logs >>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>> [5] >>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>> [6] >>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>> >>>> >>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>> wrote: >>>> >>>>> Hi all! >>>>> >>>>> I was able to analyze the attached log files and I hope that the >>>>> results may help you understand what's going wrong with instance creation. >>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>> http://paste.openstack.org/show/795356/ >>>>> >>>>> *Some statistics and problematical messages:* >>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>> since: 2020-06-30 12:30:00 ##### >>>>> Total_Number_Of_Errors --> 9 >>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>> >>>>> *nova-compute.log* >>>>> *default default] Error launching a defined domain with XML: >>>> type='kvm'>* >>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>> 69134106b56941698e58c61... >>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal >>>>> *error*: qemu unexpectedly closed the monitor: >>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>> recent call last): >>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: >>>>> 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>> >>>>> *server.log * >>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>> 422} returned with failed status* >>>>> >>>>> *ovn_controller.log* >>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>> network 'datacentre'* >>>>> >>>>> Thanks! >>>>> >>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>> >>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>> >>>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>>> root, if commented out: >>>>>>>>> #user = "root" >>>>>>>>> #group = "root" >>>>>>>>> >>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD CPU >>>>>>>>> :) and it is really not AMD CPU. >>>>>>>>> >>>>>>>>> >>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>> looks. >>>>>>>>> ComputeS01Parameters: >>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>>> >>>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> >> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> _______________________________________________ >> users mailing list >> users at lists.rdoproject.org >> http://lists.rdoproject.org/mailman/listinfo/users >> >> To unsubscribe: users-unsubscribe at lists.rdoproject.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Thu Jul 2 14:48:05 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 02 Jul 2020 16:48:05 +0200 Subject: [nova][stable] The openstack/nova stable/ocata branch is currently unmaintained In-Reply-To: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> References: <20200702140528.yrwrpyv6nt72kzlb@lyarwood.usersys.redhat.com> Message-ID: <3422063.e9J7NaK4W3@whitebase.usersys.redhat.com> On Thursday, 2 July 2020 16:05:28 CEST Lee Yarwood wrote: > Hello all, > > A quick note to highlight that the stable/ocata branch of openstack/nova > [1] is formally in the ``Unmaintained`` [2] phase of maintenance will be > moved on to the final ``EOL`` phase after a total of 6 months of > inactivity. > > I'm going to suggest that we ignore the following change as this only > attempted to remove a job from the experimental queue and doesn't > constitute actual maintenance of the branch IMHO. > > Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfs > https://review.opendev.org/#/c/714958/ The purpose of that change is to remove a job which is going to be removed from cinder too (hopefully) and finally from project-config. If ocata moves to EOL it will be possible to clean that legacy job too, so fine by me! > > As a result I consider the branch to have been inactive for 3 of the > required 6 months before it can be marked as ``EOL`` [3]. > > Volunteers are welcome to step forward and attempt to move the branch > back to the ``Extended Maintenance`` phase by proposing changes and > fixing CI in the next 3 months, otherwise the branch will be marked as > ``EOL``. And if anyone does, make sure to merge my change above :) Ciao -- Luigi From rafaelweingartner at gmail.com Thu Jul 2 14:49:52 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 2 Jul 2020 11:49:52 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: > Since the merging window for ussuri was long passed for those commits, is > it safe to assume that it will not land in stable/ussuri at all and those > will be available for victoria? > I would say so. We are lacking people to review and then merge it. How safe is to cherry pick those commits and use them in production? > As long as the person executing the cherry-picks, and maintaining the code knows what she/he is doing, you should be safe. The guys that are using this implementation (and others that I and my colleagues proposed), have a few openstack components that are customized with the patches/enhancements/extensions we developed so far; this means, they are not using the community version, but something in-between (the community releases + the patches we did). Of course, it is only possible, because we are the ones creating and maintaining these codes; therefore, we can assure quality for production. On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: > Hello Rafael, > > Since the merging window for ussuri was long passed for those commits, is > it safe to assume that it will not land in stable/ussuri at all and those > will be available for victoria? > > How safe is to cherry pick those commits and use them in production? > > > > On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> The dynamic pollster in Ceilometer will be first released in Ussuri. >> However, there are some important PRs still waiting for a merge, that might >> be important for your use case: >> * https://review.opendev.org/#/c/722092/ >> * https://review.opendev.org/#/c/715180/ >> * https://review.opendev.org/#/c/715289/ >> * https://review.opendev.org/#/c/679999/ >> * https://review.opendev.org/#/c/709807/ >> >> >> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >> wrote: >> >>> >>> >>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>> >>>> Hello, >>>> >>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>> >>> >>> I think I misunderstood your use case, sorry. I read it as if you wanted >>> to know "if a loadbalancer was deployed and has status active". >>> >>> >>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>> >>> >>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>> Ceilometer project. >>> >>> >>>> >>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>> cgoncalves at redhat.com> wrote: >>>> >>>>> Hi Ionut, >>>>> >>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>> >>>>>> Hello guys, >>>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>>> following: >>>>>> - network.services.lb.active.connections >>>>>> - network.services.lb.health_monitor >>>>>> - network.services.lb.incoming.bytes >>>>>> - network.services.lb.listener >>>>>> - network.services.lb.loadbalancer >>>>>> - network.services.lb.member >>>>>> - network.services.lb.outgoing.bytes >>>>>> - network.services.lb.pool >>>>>> - network.services.lb.total.connections >>>>>> >>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>> supported in neutron. >>>>>> >>>>>> I found >>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>> but this is not available in stein or train. >>>>>> >>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>> octavia. >>>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>>> and has status active. >>>>>> >>>>> >>>>> You can get the provisioning and operating status of Octavia load >>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>> three API endpoints for statistics [2][3][4]. >>>>> >>>>> I hope this helps with your use case. >>>>> >>>>> Cheers, >>>>> Carlos >>>>> >>>>> [1] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>> [2] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>> [3] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>> [4] >>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>> >>>>> >>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From akekane at redhat.com Thu Jul 2 15:22:30 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Thu, 2 Jul 2020 20:52:30 +0530 Subject: [glance] Weekly review priorities Message-ID: Hi Team, We are 3 weeks away from the Victoria milestone 2 release and already our review stack is increasing day by day. We need to get below important specs reviewed and merged in the next couple of weeks. Also we need some reviews on backports as well as important fixes. I have sorted down some patches which need reviews for this week. Specs: 1. sparse image upload - https://review.opendev.org/733157 2. Unified limits - https://review.opendev.org/729187 3. Image encryption - https://review.opendev.org/609667 4. Cinder store multiple stores support - https://review.opendev.org/695152 5. Duplicated image downloads - https://review.opendev.org/734683 6. Add copy-unowned-image spec https://review.opendev.org/739062 Backports: 1. Add lock per share for cinder nfs mount/umount - https://review.opendev.org/#/c/726650/ (stable/train) 2. Add lock per share for cinder nfs mount/umount - https://review.opendev.org/#/c/726914/ (stable/ussuri) 3. zuul: switch to the "plain" grenade job here too - https://review.opendev.org/739056 4. Use grenade-multinode instead of the custom legacy job - https://review.opendev.org/738693 Bug fixes on master: 1. Add image_set_property_atomic() helper - https://review.opendev.org/737868 2. Fix race condition in copy image operation - https://review.opendev.org/737596 3. Don't include plugins on 'copy-image' import - https://review.opendev.org/738675 4. Fix: Interrupted copy-image leaking data on subsequent operation - https://review.opendev.org/737867 Cleanup patches: 1. Removal of 'enable_v2_api' - https://review.opendev.org/#/c/738672/ (review dependency chain as well) Happy reviewing!! Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: From amoralej at redhat.com Thu Jul 2 15:35:07 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 17:35:07 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis wrote: > it is, i have image build failing. i can modify yaml used to create image. > can you remind me which files it would be? > > Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer). As workaround I'd propose you to use the package from: https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ or alternatively applying some local patch to tripleo-puppet-elements. > and your question, "how it can impact kvm": > > in image most of the packages get deployed from deloren repos. I believe > part is from centos repos and part of whole packages in > overcloud-full.qcow2 are from deloren. so it might have bit different minor > version, that might be incompactible... at least it have happend for me > previously with train release so i used tested ci fully from the > beginning... > I might be for sure wrong. > Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in: http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, > wrote: > >> >> >> On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis >> wrote: >> >>> by the way in CentOS8, here is an error message I receive when searching >>> around >>> >>> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >>> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >>> Errors during downloading metadata for repository >>> 'rdo-trunk-ussuri-tested': >>> - Status code: 403 for >>> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >>> (IP: 3.87.151.16) >>> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >>> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >>> mirrors were tried >>> [stack at rdo-u ~]$ >>> >>> >> Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled >> by default and not longer usable (i'll send a patch to retire it), don't >> enable it. >> >> Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead >> to install CentOS8 maintained kvm. BTW, i think that package should not be >> required in CentOS8: >> >> >> https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def >> >> >> >> >>> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >>> wrote: >>> >>>> Hi All, >>>> >>>> I have one idea, why it might be the issue. >>>> >>>> during image creation step, I have hadd missing packets: >>>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>>> PCS thing can be found in HA repo, so I enabled it, but >>>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>>> >>>> I believe that is a case... >>>> so it installed non CentOS8 maintained kvm or some dependent >>>> packages.... >>>> >>>> How can I get osops-tools-monitoring-oschecks from centos repos? it is >>>> last seen in CentOS7 repos.... >>>> >>>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>>> osops-tools-monitoring-oschecks -A2 >>>> osops-tools-monitoring-oschecks.noarch >>>> 0.0.1-0.20191202171903.bafe3f0.el7 >>>> >>>> rdo-trunk-train-tested >>>> ostree-debuginfo.x86_64 2019.1-2.el7 >>>> base-debuginfo >>>> (undercloud) [stack at ironic-poc ~]$ >>>> >>>> can I somehow not include that package in image creation? OR if it is >>>> essential, can I create a different repo for that one? >>>> >>>> >>>> >>>> >>>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> Hi all! >>>>> >>>>> Here we go, we are in the second part of this interesting >>>>> troubleshooting! >>>>> >>>>> 1) I have LogTool setup.Thank you Arkady. >>>>> >>>>> 2) I have user OSP to create instance, and I have used virsh to create >>>>> instance. >>>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>>> image-based, it is failing either way.. [1] and [2] >>>>> 2.2) when I create it using CLI: [0] [3] >>>>> >>>>> any ideas what can be wrong? What options I should choose? >>>>> I have one network/vlan for whole cloud. I am doing proof of concept >>>>> of remote booting, so I do not have br-ex setup. and I do not have >>>>> br-provider. >>>>> >>>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>>> it should look like so it would have br-ex and br-int connected? as >>>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>>> As I understand, in roles data yaml, when we have tag external it >>>>> should create br-ex? or am I wrong? >>>>> >>>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>>> running. >>>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>>> logs >>>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>>> [5] >>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>>> [6] >>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>>> >>>>> >>>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>>> wrote: >>>>> >>>>>> Hi all! >>>>>> >>>>>> I was able to analyze the attached log files and I hope that the >>>>>> results may help you understand what's going wrong with instance creation. >>>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>>> http://paste.openstack.org/show/795356/ >>>>>> >>>>>> *Some statistics and problematical messages:* >>>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>>> since: 2020-06-30 12:30:00 ##### >>>>>> Total_Number_Of_Errors --> 9 >>>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>>> >>>>>> *nova-compute.log* >>>>>> *default default] Error launching a defined domain with XML: >>>>> type='kvm'>* >>>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>>> 69134106b56941698e58c61... >>>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: >>>>>> internal *error*: qemu unexpectedly closed the monitor: >>>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... >>>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>>> recent call last): >>>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager >>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>>> >>>>>> *server.log * >>>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>>> 422} returned with failed status* >>>>>> >>>>>> *ovn_controller.log* >>>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>>> network 'datacentre'* >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>>> >>>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>>> >>>>>>>>>> Thank you, I will try. I also modified a file, and it looked like >>>>>>>>>> it relaunched podman container once config was changed. Either way, if I >>>>>>>>>> understand Linux config correctly, the default value for user and group is >>>>>>>>>> root, if commented out: >>>>>>>>>> #user = "root" >>>>>>>>>> #group = "root" >>>>>>>>>> >>>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD >>>>>>>>>> CPU :) and it is really not AMD CPU. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>>> looks. >>>>>>>>>> ComputeS01Parameters: >>>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>>> >>>>> >>>> >>>> -- >>>> Ruslanas Gžibovskis >>>> +370 6030 7030 >>>> >>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> _______________________________________________ >>> users mailing list >>> users at lists.rdoproject.org >>> http://lists.rdoproject.org/mailman/listinfo/users >>> >>> To unsubscribe: users-unsubscribe at lists.rdoproject.org >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amoralej at redhat.com Thu Jul 2 16:03:17 2020 From: amoralej at redhat.com (Alfredo Moralejo Alonso) Date: Thu, 2 Jul 2020 18:03:17 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 5:35 PM Alfredo Moralejo Alonso wrote: > > > On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis > wrote: > >> it is, i have image build failing. i can modify yaml used to create >> image. can you remind me which files it would be? >> >> > Right, I see that the patch must not be working fine for centos and the > package is being installed from delorean repos in the log. I guess it > needs an entry to cover the centos 8 case (i'm checking with opstools > maintainer). > https://review.opendev.org/739085 > As workaround I'd propose you to use the package from: > > > https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ > > or alternatively applying some local patch to tripleo-puppet-elements. > > >> and your question, "how it can impact kvm": >> >> in image most of the packages get deployed from deloren repos. I believe >> part is from centos repos and part of whole packages in >> overcloud-full.qcow2 are from deloren. so it might have bit different minor >> version, that might be incompactible... at least it have happend for me >> previously with train release so i used tested ci fully from the >> beginning... >> I might be for sure wrong. >> > > Delorean repos contain only OpenStack packages, things like nova, etc... > not kvm or things included in CentOS repos. KVM will always installed which > should be installed from "Advanced Virtualization" repository. May you > check what versions of qemu-kvm and libvirt you got installed into the > overcloud-full image?, it should match with the versions in: > > > http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ > > like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > >> >> On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, >> wrote: >> >>> >>> >>> On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis >>> wrote: >>> >>>> by the way in CentOS8, here is an error message I receive when >>>> searching around >>>> >>>> [stack at rdo-u ~]$ dnf list --enablerepo="*" --disablerepo >>>> "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks >>>> Errors during downloading metadata for repository >>>> 'rdo-trunk-ussuri-tested': >>>> - Status code: 403 for >>>> https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repomd.xml >>>> (IP: 3.87.151.16) >>>> Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': >>>> Cannot download repomd.xml: Cannot download repodata/repomd.xml: All >>>> mirrors were tried >>>> [stack at rdo-u ~]$ >>>> >>>> >>> Yep, rdo-trunk-ussuri-tested repo included in the release rpm is >>> disabled by default and not longer usable (i'll send a patch to retire it), >>> don't enable it. >>> >>> Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead >>> to install CentOS8 maintained kvm. BTW, i think that package should not be >>> required in CentOS8: >>> >>> >>> https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d0939ac0cebedac7bda3398def >>> >>> >>> >>> >>>> On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have one idea, why it might be the issue. >>>>> >>>>> during image creation step, I have hadd missing packets: >>>>> pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs >>>>> PCS thing can be found in HA repo, so I enabled it, but >>>>> "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... >>>>> >>>>> I believe that is a case... >>>>> so it installed non CentOS8 maintained kvm or some dependent >>>>> packages.... >>>>> >>>>> How can I get osops-tools-monitoring-oschecks from centos repos? it >>>>> is last seen in CentOS7 repos.... >>>>> >>>>> $ yum list --enablerepo=* --disablerepo "c7-media" | grep >>>>> osops-tools-monitoring-oschecks -A2 >>>>> osops-tools-monitoring-oschecks.noarch >>>>> 0.0.1-0.20191202171903.bafe3f0.el7 >>>>> >>>>> rdo-trunk-train-tested >>>>> ostree-debuginfo.x86_64 2019.1-2.el7 >>>>> base-debuginfo >>>>> (undercloud) [stack at ironic-poc ~]$ >>>>> >>>>> can I somehow not include that package in image creation? OR if it is >>>>> essential, can I create a different repo for that one? >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis >>>>> wrote: >>>>> >>>>>> Hi all! >>>>>> >>>>>> Here we go, we are in the second part of this interesting >>>>>> troubleshooting! >>>>>> >>>>>> 1) I have LogTool setup.Thank you Arkady. >>>>>> >>>>>> 2) I have user OSP to create instance, and I have used virsh to >>>>>> create instance. >>>>>> 2.1) OSP way is failing in either way, if it is volume-based or >>>>>> image-based, it is failing either way.. [1] and [2] >>>>>> 2.2) when I create it using CLI: [0] [3] >>>>>> >>>>>> any ideas what can be wrong? What options I should choose? >>>>>> I have one network/vlan for whole cloud. I am doing proof of concept >>>>>> of remote booting, so I do not have br-ex setup. and I do not have >>>>>> br-provider. >>>>>> >>>>>> There is my compute[5] and controller[6] yaml files, Please help, how >>>>>> it should look like so it would have br-ex and br-int connected? as >>>>>> br-int now is in UNKNOWN state. And br-ex do not exist. >>>>>> As I understand, in roles data yaml, when we have tag external it >>>>>> should create br-ex? or am I wrong? >>>>>> >>>>>> [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is >>>>>> running. >>>>>> [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute >>>>>> logs >>>>>> [2] http://paste.openstack.org/show/795431/ < controller logs >>>>>> [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ >>>>>> [4] http://paste.openstack.org/show/795433/ < xml file for >>>>>> [5] >>>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS01.yaml >>>>>> [6] >>>>>> https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controller.yaml >>>>>> >>>>>> >>>>>> On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler >>>>>> wrote: >>>>>> >>>>>>> Hi all! >>>>>>> >>>>>>> I was able to analyze the attached log files and I hope that the >>>>>>> results may help you understand what's going wrong with instance creation. >>>>>>> You can find *Log_Tool's unique exported Error blocks* here: >>>>>>> http://paste.openstack.org/show/795356/ >>>>>>> >>>>>>> *Some statistics and problematical messages:* >>>>>>> ##### Statistics - Number of Errors/Warnings per Standard OSP log >>>>>>> since: 2020-06-30 12:30:00 ##### >>>>>>> Total_Number_Of_Errors --> 9 >>>>>>> /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 >>>>>>> /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 >>>>>>> /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 >>>>>>> >>>>>>> *nova-compute.log* >>>>>>> *default default] Error launching a defined domain with XML: >>>>>> type='kvm'>* >>>>>>> 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager >>>>>>> [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b >>>>>>> 69134106b56941698e58c61... >>>>>>> 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: >>>>>>> internal *error*: qemu unexpectedly closed the monitor: >>>>>>> 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR >>>>>>> 0... >>>>>>> he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to >>>>>>> set MSR 0x48e to 0xfff9fffe04006172* >>>>>>> _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. >>>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most >>>>>>> recent call last): >>>>>>> 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager >>>>>>> [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File >>>>>>> "/usr/lib/python3.6/site-packages/nova/vir... >>>>>>> >>>>>>> *server.log * >>>>>>> 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': >>>>>>> 422} returned with failed status* >>>>>>> >>>>>>> *ovn_controller.log* >>>>>>> 272-2020-06-30T12:30:10.126079625+02:00 stderr F >>>>>>> 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for >>>>>>> network 'datacentre'* >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>>>>>>>> reports when using nested virtualization in other OSes. >>>>>>>>>>>>> >>>>>>>>>>>> baremetal. Dell R630 if to be VERY precise. >>>>>>>>>>> >>>>>>>>>>> Thank you, I will try. I also modified a file, and it looked >>>>>>>>>>> like it relaunched podman container once config was changed. Either way, if >>>>>>>>>>> I understand Linux config correctly, the default value for user and group >>>>>>>>>>> is root, if commented out: >>>>>>>>>>> #user = "root" >>>>>>>>>>> #group = "root" >>>>>>>>>>> >>>>>>>>>>> also in some logs, I saw, that it detected, that it is not AMD >>>>>>>>>>> CPU :) and it is really not AMD CPU. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just for fun, it might be important, here is how my node info >>>>>>>>>>> looks. >>>>>>>>>>> ComputeS01Parameters: >>>>>>>>>>> NovaReservedHostMemory: 16384 >>>>>>>>>>> KernelArgs: "crashkernel=no rhgb" >>>>>>>>>>> ComputeS01ExtraConfig: >>>>>>>>>>> nova::cpu_allocation_ratio: 4.0 >>>>>>>>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>>>>>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>>>>>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ruslanas Gžibovskis >>>>> +370 6030 7030 >>>>> >>>> >>>> >>>> -- >>>> Ruslanas Gžibovskis >>>> +370 6030 7030 >>>> _______________________________________________ >>>> users mailing list >>>> users at lists.rdoproject.org >>>> http://lists.rdoproject.org/mailman/listinfo/users >>>> >>>> To unsubscribe: users-unsubscribe at lists.rdoproject.org >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashlee at openstack.org Thu Jul 2 16:55:47 2020 From: ashlee at openstack.org (Ashlee Ferguson) Date: Thu, 2 Jul 2020 11:55:47 -0500 Subject: [Airship-discuss] [2020 Summit] Programming Committee Nominations Open In-Reply-To: <97832365-8405-4277-BFA6-64BB9F9C1F43@openstack.org> References: <97832365-8405-4277-BFA6-64BB9F9C1F43@openstack.org> Message-ID: <576987C3-647F-43C0-AE6E-F4F3C189DCE4@openstack.org> Hi everyone, Just a reminder that Programming Committee nominations for the 2020 Open Infrastructure Summit are open. If you’re an expert in any of the below categories, and would like to help program the Summit content, please fill out this form to nominate yourself or someone else: https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 Thanks! Ashlee Ashlee Ferguson Community & Events Coordinator OpenStack Foundation > On Jun 24, 2020, at 12:06 PM, Ashlee Ferguson wrote: > > Programming Committee nominations for the 2020 Open Infrastructure Summit are open! > > Programming Committees for each Track will help build the Summit schedule, and are made up of individuals working in open infrastructure. Responsibilities include: > • Help the Summit team put together the best possible content based on your subject matter expertise > • Promote the individual Tracks within your networks > • Review the submissions and Community voting results in your particular Track > • Determine if there are any major content gaps in your Track, and if so, potentially solicit additional speakers directly to submit > • Ensure diversity of speakers and companies represented in your Track > • Avoid vendor sales pitches, focusing more on real-world user stories and technical, in-the-trenches experiences > > 2020 Summit Tracks: > • 5G, NFV & Edge > • AI, Machine Learning & HPC > • CI/CD > • Container Infrastructure > • Getting Started > • Hands-on Workshops > • Open Development > • Private & Hybrid Cloud > • Public Cloud > • Security > > If you’re interested in nominating yourself or someone else to be a member of the Summit Programming Committee for a specific Track, please fill out the nomination form[1]. Nominations will close on July 10, 2020. > > NOMINATION FORM[1] > > Programming Committee selections will occur before we open the Call for Presentations (CFP) to receive presentations so that the Committees can host office hours to consult on submissions, and help promote the event. > > The CFP will be open July 1 - August 4, 2020. > > Please email speakersupport at openstack.org with any questions or feedback. > > Cheers, > Ashlee > > [1] https://openstackfoundation.formstack.com/forms/programmingcommitteenom_summit2020 > > > Ashlee Ferguson > Community & Events Coordinator > OpenStack Foundation > > > _______________________________________________ > Airship-discuss mailing list > Airship-discuss at lists.airshipit.org > http://lists.airshipit.org/cgi-bin/mailman/listinfo/airship-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jul 2 20:59:43 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 2 Jul 2020 22:59:43 +0200 Subject: [neutron] Drivers meeting agenda - 03.07.2020 Message-ID: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Hi, For tomorrows drivers meeting we have 1 RFE to discuss: https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip port_forwarding] Add port ranges See You all on the meeting tomorrow. — Slawek Kaplonski Senior software engineer Red Hat From miguel at mlavalle.com Thu Jul 2 21:04:04 2020 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 2 Jul 2020 16:04:04 -0500 Subject: [neutron] Drivers meeting agenda - 03.07.2020 In-Reply-To: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> References: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Message-ID: Hi Slawek, Saturday is 4th of July, the US Independence day. Many employers, like mine, are giving us tomorrow off. It may also be the case for the RH members of this team based in the US Cheers Miguel On Thu, Jul 2, 2020 at 3:59 PM Slawek Kaplonski wrote: > Hi, > > For tomorrows drivers meeting we have 1 RFE to discuss: > > https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip > port_forwarding] Add port ranges > > See You all on the meeting tomorrow. > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Thu Jul 2 21:22:57 2020 From: amy at demarco.com (Amy Marrich) Date: Thu, 2 Jul 2020 16:22:57 -0500 Subject: [Diversity] Diversity & Inclusion WG Meeting 7/6 Message-ID: The Diversity & Inclusion WG invites members of all OSF projects to our next meeting Monday, July 6th, at 17:00 UTC in the #openstack-diversity channel. The agenda can be found at https://etherpad.openstack.org/p/diversity -wg-agenda. We will be discussing changing our Wiki page to reflect the broader OSF projects and communities so that the page reflects our mission. Please feel free to add any other topics you wish to discuss at the meeting. Thanks, Amy (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Thu Jul 2 21:45:00 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 14:45:00 -0700 Subject: [TC] [all] OSU Intern Work Message-ID: Hello! As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) I look forward to your suggestions. -Kendall (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Thu Jul 2 21:52:44 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 14:52:44 -0700 Subject: [all][TC] New Office Hours Times Message-ID: Hello! It's been a while since the office hours had been refreshed and we have a lot of new people on the TC that were not around when the times were set. In an effort to stir things up a bit, and get more community engagement, we are picking new times! I want to invite everyone in the community interested in interacting more with the TC to respond to the poll so we have your input as the office hours are really for your benefit anyway. (Nevermind the name of the poll :) Too much work to remake the whole thing just to rename it..) That said, we do need responses from ALL TC members so that we can also document who will (typically) be present for each office hour as well. (Also, thanks Mohammed for putting the poll together! It's no joke. ) -Kendall (diablo_rojo) [1] https://doodle.com/poll/q27t8pucq7b8xbme -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 2 22:08:12 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 02 Jul 2020 17:08:12 -0500 Subject: [TC] [all] OSU Intern Work In-Reply-To: References: Message-ID: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson wrote ---- > Hello! > As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. > It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. > Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) > I look forward to your suggestions. Thanks Kendal for starting this. +100 for OSC help. Also we should consider upstream-investment-opportunities list which is our help needed things in community and we really look for some help on that since starting. For example, help on 'Consistent and Secure Policy Defaults' can be good thing to contribute which is a popup team in this cycle too[2], Raildo and myself can help for mentorship in this. [1] https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html [2] https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies -gmann > -Kendall (diablo_rojo) From kennelson11 at gmail.com Thu Jul 2 22:18:49 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 2 Jul 2020 15:18:49 -0700 Subject: [TC] [all] OSU Intern Work In-Reply-To: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> References: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> Message-ID: On Thu, Jul 2, 2020 at 3:08 PM Ghanshyam Mann wrote: > ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson < > kennelson11 at gmail.com> wrote ---- > > Hello! > > As you may or may not know, the OSF funded a student at Oregon State > University last year to work on OpenStack part time. He did a lot of > amazing work on Glance but sadly we are coming to the end of his internship > as he will be graduating soon. I'm happy to report that we have the budget > to fund another student part time to work on OpenStack again and I wanted > to collect suggestions of projects/areas that a student could be helpful > in. > > It is important to note, that they will only be working part time and, > while I will be helping to mentor them, I will likely need a co-mentor in > the area/topic to help me get them going, get their patches reviewed, > answer questions as they go etc. > > Originally, I had thought about assigning them to Glance (like this > past year) or Designate (like we had considered last year), but now I am > thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a > better fit? If you are interested in helping mentor a student in any of > those areas or have a better idea I am all ears :) > > I look forward to your suggestions. > > Thanks Kendal for starting this. > > +100 for OSC help. > > Also we should consider upstream-investment-opportunities list which is > our help needed things in community and > we really look for some help on that since starting. For example, help on > 'Consistent and Secure Policy Defaults' can > be good thing to contribute which is a popup team in this cycle too[2], > Raildo and myself can help for mentorship in this. > I will definitely take a look at the list, but my understanding was that we wanted someone to work on those things that would be sticking around a little more long term and full time? I can only guarantee the student will be around for the school year and only part time. If I'm wrong, I can definitely rank the policy work a little higher on my list :) > [1] > https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html > [2] > https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies > > -gmann > > > > -Kendall (diablo_rojo) > -Kendall (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Jul 3 07:53:10 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 3 Jul 2020 09:53:10 +0200 Subject: [neutron] Drivers meeting agenda - 03.07.2020 In-Reply-To: References: <02B0BD23-241B-41A4-A02E-C8B8DD6C99C5@redhat.com> Message-ID: <4B3B5595-3BB9-48C0-ACBB-87C3C8A58AD7@redhat.com> Hi, Thx for info Miguel. I didn’t know that You have day off on Friday. I’m not sure if that is the case for others too. Lets see if we will have quorum on the meeting then. If not, we will skip it for this week :) Have a great long weekend :) > On 2 Jul 2020, at 23:04, Miguel Lavalle wrote: > > Hi Slawek, > > Saturday is 4th of July, the US Independence day. Many employers, like mine, are giving us tomorrow off. It may also be the case for the RH members of this team based in the US > > Cheers > > Miguel > > On Thu, Jul 2, 2020 at 3:59 PM Slawek Kaplonski wrote: > Hi, > > For tomorrows drivers meeting we have 1 RFE to discuss: > > https://bugs.launchpad.net/neutron/+bug/1885921 - [RFE][floatingip port_forwarding] Add port ranges > > See You all on the meeting tomorrow. > > — > Slawek Kaplonski > Senior software engineer > Red Hat > — Slawek Kaplonski Senior software engineer Red Hat From moguimar at redhat.com Fri Jul 3 08:28:27 2020 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Fri, 3 Jul 2020 10:28:27 +0200 Subject: [oslo] PTO on Monday In-Reply-To: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> References: <6e2dc5a8-434d-380a-241c-5b29d26f8f12@nemebean.com> Message-ID: Monday will also be a holiday in the Czech Republic. On Wed, Jul 1, 2020 at 11:58 PM Ben Nemec wrote: > Hi Oslo, > > I'm making this a four day weekend (Friday is a US holiday), so I won't > be around for the meeting on Monday. If someone else wants to run it > then feel free to hold it without me. Otherwise we'll return to the > regular schedule the following week. > > -Ben > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronzhu1121 at gmail.com Fri Jul 3 09:08:03 2020 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Fri, 3 Jul 2020 17:08:03 +0800 Subject: [Telemetry] Propose Matthias Runge for Telemetry core In-Reply-To: References: Message-ID: Welcome Matthias, I have added you to the ceilometer core team. Lingxian Kong 于2020年6月24日 周三09:57写道: > +1 welcome! > > --- > Lingxian Kong > Senior Software Engineer > Catalyst Cloud > www.catalystcloud.nz > > > On Tue, Jun 23, 2020 at 11:47 PM Rong Zhu wrote: > >> Hello all, >> >> Matthias Runge have been very active in the repository with patches and >> reviews. >> So I would like to propose adding Matthias as core developer for the >> telemetry project. >> >> Please, feel free to add your votes into the thread. >> -- >> Thanks, >> Rong Zhu >> > -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.mutel at gmail.com Fri Jul 3 09:25:01 2020 From: samuel.mutel at gmail.com (Samuel Mutel) Date: Fri, 3 Jul 2020 11:25:01 +0200 Subject: [Telemetry] Error when sending to prometheus pushgateway Message-ID: Hello, I have two questions about ceilometer (openstack version rocky). - First of all, it seems that ceilometer is sending metrics every hour and I don't understand why. - Next, I am not able to setup ceilometer to send metrics to prometheus pushgateway. Here is my configuration: > sources: > - name: meter_file > interval: 30 > meters: > - "*" > sinks: > - prometheus > > sinks: > - name: prometheus > publishers: > - prometheus://10.60.4.11:9091/metrics/job/ceilometer > Here is the error I received: > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http Traceback > (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", line 178, > in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http raise > HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http HTTPError: > 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Thanks for your help on this topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrunge at matthias-runge.de Fri Jul 3 10:05:24 2020 From: mrunge at matthias-runge.de (Matthias Runge) Date: Fri, 3 Jul 2020 12:05:24 +0200 Subject: [Telemetry] Propose Matthias Runge for Telemetry core In-Reply-To: References: Message-ID: <9e445a50-3d6b-df74-ab69-b218016cbae0@matthias-runge.de> On 03/07/2020 11:08, Rong Zhu wrote: > Welcome Matthias, I have added you to the ceilometer core team. > > Lingxian Kong >于2020 > 年6月24日 周三09:57写道: Thank you, I feel honored. Matthias > > +1 welcome! > > --- > Lingxian Kong > Senior Software Engineer > Catalyst Cloud > www.catalystcloud.nz > > > On Tue, Jun 23, 2020 at 11:47 PM Rong Zhu > wrote: > > Hello all, > > Matthias Runge have been very active in the repository with > patches and reviews. > So I would like to propose adding Matthias as core developer for > the telemetry project. > > Please, feel free to add your votes into the thread. > -- > Thanks, > Rong Zhu > > -- > Thanks, > Rong Zhu From mrunge at matthias-runge.de Fri Jul 3 10:10:29 2020 From: mrunge at matthias-runge.de (Matthias Runge) Date: Fri, 3 Jul 2020 12:10:29 +0200 Subject: [Telemetry] Error when sending to prometheus pushgateway In-Reply-To: References: Message-ID: <731c90df-8830-1804-10a8-a9a97a3e2f55@matthias-runge.de> On 03/07/2020 11:25, Samuel Mutel wrote: > Hello, > > I have two questions about ceilometer (openstack version rocky). > > * First of all, it seems that ceilometer is sending metrics every hour > and I don't understand why. > * Next, I am not able to setup ceilometer to send metrics to > prometheus pushgateway. > > Here is my configuration: > > sources: >   - name: meter_file >     interval: 30 >     meters: >       - "*" >     sinks: >       - prometheus > > sinks: >   - name: prometheus >     publishers: >             - prometheus://10.60.4.11:9091/metrics/job/ceilometer > > > > Here is the error I received: > > vcpus{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2 > # TYPE memory gauge > memory{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 2048 > # TYPE disk.ephemeral.size gauge > disk.ephemeral.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} > 0 > # TYPE disk.root.size gauge > disk.root.size{resource_id="7fab268b-ca7c-4692-a103-af4a69f817e4"} 0 > : HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > Traceback (most recent call last): > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http   File > "/usr/lib/python2.7/dist-packages/ceilometer/publisher/http.py", > line 178, in _do_post > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http     > res.raise_for_status() > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http   File > "/usr/lib/python2.7/dist-packages/requests/models.py", line 935, in > raise_for_status > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http     > raise HTTPError(http_error_msg, response=self) > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > HTTPError: 400 Client Error: Bad Request for url: > http://10.60.4.11:9091/metrics/job/ceilometer > 2020-07-01 17:00:12.272 11375 ERROR ceilometer.publisher.http > > > Thanks for your help on this topic. Hi, first obvious question: are you sure that there is something listening under http://10.60.4.11:9091/metrics/job/ceilometer ? Would you have some error logs from the other side? It seems that ceilometer is trying to dispatch as expected. Matthias From anost1986 at gmail.com Fri Jul 3 11:20:20 2020 From: anost1986 at gmail.com (Andrii Ostapenko) Date: Fri, 3 Jul 2020 06:20:20 -0500 Subject: [loci][helm][k8s] When do images on docker.io get updated Message-ID: Hello Corne, OSH uses images built using gates in openstack/openstack-helm-images repository, not in loci itself. You may want to add a definition for watcher image similar to [0] and then refer to it in the corresponding release job, e.g. for Stein [1]. After your commit to openstack-helm-images is merged, new images will be published to docker.io/openstackhelm/watcher repository and can be used in the way you referenced them in your OSH commit [2]. [0] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L269-L279 [1] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L454-L481 [2] https://review.opendev.org/#/c/720140/ From info at dantalion.nl Fri Jul 3 12:34:28 2020 From: info at dantalion.nl (info at dantalion.nl) Date: Fri, 3 Jul 2020 14:34:28 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: References: Message-ID: Hello Andrii, I understand, this is unfortunate however as when I previously asked I was told that it could be achieved both using loci or openstack-helm-images. Seeing how the loci patch took around 7 months to merge I have now faced quite some delays. I will submit the patch to openstack-helm-images soon, thanks for clarifying. PS: Octavia is still using loci images for openstack-helm is that something that should be updated? https://opendev.org/openstack/openstack-helm/src/branch/master/octavia/values.yaml#L54 King regards, Corne Lukken On 03-07-2020 13:20, Andrii Ostapenko wrote: > Hello Corne, > > OSH uses images built using gates in openstack/openstack-helm-images > repository, not in loci itself. You may want to add a definition for > watcher image similar to [0] and then refer to it in the corresponding > release job, e.g. for Stein [1]. > > After your commit to openstack-helm-images is merged, new images will > be published to docker.io/openstackhelm/watcher repository and can be > used in the way you referenced them in your OSH commit [2]. > > [0] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L269-L279 > [1] https://opendev.org/openstack/openstack-helm-images/src/branch/master/zuul.d/openstack-loci.yaml#L454-L481 > [2] https://review.opendev.org/#/c/720140/ > From amotoki at gmail.com Fri Jul 3 13:39:28 2020 From: amotoki at gmail.com (Akihiro Motoki) Date: Fri, 3 Jul 2020 22:39:28 +0900 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: On Thu, Jul 2, 2020 at 10:37 PM Ruby Loo wrote: > > Hi, > > On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki wrote: >> >> On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona wrote: >> > >> > Hi, >> > Simplification sounds good (I do not take into considerations like "no code fanatic movements" or similar). >> > How this could affect upgrade, I am sure there are deployments older than pike, and those at a point will >> > got for some newer version (I hope we can give them good answers for their problems as Openstack) >> > >> > What do you think about stadium projects? As those have much less activity (as mostly solve one rather specific problem), >> > and much less migration scripts shall we just "merge" those to init ops? >> > I checked quickly a few stadium project and only bgpvpn has newer migration scripts than pike. >> >> In my understanding, squashing migrations can be done repository by repository. >> A revision hash of each migration is not changed and head revisions >> are stored in the database per repository, so it should work. >> For initial deployments, neutron-db-manage runs all db migrations from >> the initial revision to a specified revision (release), so it has no >> problem. >> For upgrade scenarios, this change just means that we just dropped >> support upgrade from releases included in squashed migrations. >> For example, if we squash migrations up to rocky (and create >> rocky_initial migration) in the neutron repo, we no longer support db >> migration from releases before rocky. This would be the only >> difference I see. > > > > I wonder if this is acceptable (that an OpenStack service will not support db migrations prior to rocky). What is (or is there?) OpenStack's stance wrt support for upgrades? We are using ocata and plan on upgrading but we don't know when that might happen :-( > > --ruby It is not true. What we the upstream community recommend is to upgrade the controller node and databases in the fast-foward upgrade manner. Even if the upstream repository just provides database migration from for example Rocky, you can upgrade from a release older than rocky, by upgrading one release by one. In addition, by keeping a specific number of releases in db migrations, operators can still upgrade from more than one old release (if they want). --amotoki From gmann at ghanshyammann.com Fri Jul 3 14:48:50 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 03 Jul 2020 09:48:50 -0500 Subject: [TC] [all] OSU Intern Work In-Reply-To: References: <1731192af32.114599dad364460.4004111077412968560@ghanshyammann.com> Message-ID: <1731526ca3b.11b1ddef7398808.2475388787302983606@ghanshyammann.com> ---- On Thu, 02 Jul 2020 17:18:49 -0500 Kendall Nelson wrote ---- > > > On Thu, Jul 2, 2020 at 3:08 PM Ghanshyam Mann wrote: > ---- On Thu, 02 Jul 2020 16:45:00 -0500 Kendall Nelson wrote ---- > > Hello! > > As you may or may not know, the OSF funded a student at Oregon State University last year to work on OpenStack part time. He did a lot of amazing work on Glance but sadly we are coming to the end of his internship as he will be graduating soon. I'm happy to report that we have the budget to fund another student part time to work on OpenStack again and I wanted to collect suggestions of projects/areas that a student could be helpful in. > > It is important to note, that they will only be working part time and, while I will be helping to mentor them, I will likely need a co-mentor in the area/topic to help me get them going, get their patches reviewed, answer questions as they go etc. > > Originally, I had thought about assigning them to Glance (like this past year) or Designate (like we had considered last year), but now I am thinking the User Facing API work (OpenStackSDK/OSC/et al) might be a better fit? If you are interested in helping mentor a student in any of those areas or have a better idea I am all ears :) > > I look forward to your suggestions. > > Thanks Kendal for starting this. > > +100 for OSC help. > > Also we should consider upstream-investment-opportunities list which is our help needed things in community and > we really look for some help on that since starting. For example, help on 'Consistent and Secure Policy Defaults' can > be good thing to contribute which is a popup team in this cycle too[2], Raildo and myself can help for mentorship in this. > > I will definitely take a look at the list, but my understanding was that we wanted someone to work on those things that would be sticking around a little more long term and full time? I can only guarantee the student will be around for the school year and only part time. > If I'm wrong, I can definitely rank the policy work a little higher on my list :) Thanks, part time help will be valuable too in policy work. For example, doing it for 1-2 projects (who has small set of policies) can be good progress. -gmann > > [1] https://governance.openstack.org/tc/reference/upstream-investment-opportunities/2020/index.html > [2] https://governance.openstack.org/tc/reference/popup-teams.html#secure-default-policies > > -gmann > > > > -Kendall (diablo_rojo) > > -Kendall (diablo_rojo) From ildiko.vancsa at gmail.com Fri Jul 3 14:52:50 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Fri, 3 Jul 2020 16:52:50 +0200 Subject: [cyborg] Incomplete v2 API in Train Message-ID: <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Hi Cyborg Team, I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? Thanks, Ildikó [1] https://www.lfnetworking.org/about/cntt/ [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 From mnaser at vexxhost.com Fri Jul 3 16:05:21 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 3 Jul 2020 12:05:21 -0400 Subject: [TC] Monthly Meeting Summary Message-ID: Hi everyone, Here’s a summary of what happened in our TC monthly meeting last Thursday, July 2nd. # ATTENDEES (LINES SAID) - mnaser (106) - gmann (43) - evrardjp (32) - diablo_rojo (32) - njohnston (11) - jungleboyj (8) - openstack (7) - ricolin (6) - fungi (4) - ttx (2) - clarkb (2) - belmoreira (1) - knikolla (1) - AJaeger (1) # MEETING SUMMARY - Rollcall (mnaser, 14:00:40) - Follow up on past action items (mnaser, 14:04:59) - OpenStack Foundation OSU Intern Project (diablo_rojo) (mnaser, 14:26:55) - W cycle goal selection start (mnaser, 14:36:02) - https://governance.openstack.org/tc/goals/#goal-selection-schedule (gmann, 14:37:37) - Completion of retirement cleanup (gmann) (mnaser, 14:48:36) - https://etherpad.opendev.org/p/tc-retirement-cleanup is a scratch pad; nothing pushed out towards the community (mnaser, 14:48:59) # ACTION ITEMS - evrardjp & njohnston to start writing resolution about how deconstructed PTL role - mnaser to find the owner to start using facing API pop-up team over ML - gmann update goal selection docs to clarify the goal count - gmann start discussion around reviewing currenet tags - mnaser propose change to implement weekly meetings - diablo_rojo start discussion on ML around potential items for OSF funded intern - njohnston and mugsie to work on getting goals groomed/proposed for W cycle - TC and community to help finish properly and cleanly retiring projects To read the full logs of the meeting, please refer to http://eavesdrop.openstack.org/meetings/tc/2020/tc.2020-07-02-14.00.log.html. -- Mohammed Naser VEXXHOST, Inc. From ionut at fleio.com Fri Jul 3 16:19:29 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 3 Jul 2020 19:19:29 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi Rafael, I think I applied all the reviews successfully but I tried to do an octavia dynamic poller but I have couples of errors. Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ if i remove the - in front of name like this: https://paste.xinu.at/K7s5I8/ The error is different this time: https://paste.xinu.at/zWdC0U/ Is there something I missed or is something wrong in yaml? On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > > Since the merging window for ussuri was long passed for those commits, is >> it safe to assume that it will not land in stable/ussuri at all and those >> will be available for victoria? >> > > I would say so. We are lacking people to review and then merge it. > > How safe is to cherry pick those commits and use them in production? >> > As long as the person executing the cherry-picks, and maintaining the code > knows what she/he is doing, you should be safe. The guys that are using > this implementation (and others that I and my colleagues proposed), have a > few openstack components that are customized with the > patches/enhancements/extensions we developed so far; this means, they are > not using the community version, but something in-between (the community > releases + the patches we did). Of course, it is only possible, because we > are the ones creating and maintaining these codes; therefore, we can assure > quality for production. > > > > > On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: > >> Hello Rafael, >> >> Since the merging window for ussuri was long passed for those commits, is >> it safe to assume that it will not land in stable/ussuri at all and those >> will be available for victoria? >> >> How safe is to cherry pick those commits and use them in production? >> >> >> >> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>> However, there are some important PRs still waiting for a merge, that might >>> be important for your use case: >>> * https://review.opendev.org/#/c/722092/ >>> * https://review.opendev.org/#/c/715180/ >>> * https://review.opendev.org/#/c/715289/ >>> * https://review.opendev.org/#/c/679999/ >>> * https://review.opendev.org/#/c/709807/ >>> >>> >>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >>> wrote: >>> >>>> >>>> >>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>> >>>>> Hello, >>>>> >>>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>>> >>>> >>>> I think I misunderstood your use case, sorry. I read it as if you >>>> wanted to know "if a loadbalancer was deployed and has status active". >>>> >>>> >>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>> >>>> >>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>> Ceilometer project. >>>> >>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>> cgoncalves at redhat.com> wrote: >>>>> >>>>>> Hi Ionut, >>>>>> >>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>>> >>>>>>> Hello guys, >>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer the >>>>>>> following: >>>>>>> - network.services.lb.active.connections >>>>>>> - network.services.lb.health_monitor >>>>>>> - network.services.lb.incoming.bytes >>>>>>> - network.services.lb.listener >>>>>>> - network.services.lb.loadbalancer >>>>>>> - network.services.lb.member >>>>>>> - network.services.lb.outgoing.bytes >>>>>>> - network.services.lb.pool >>>>>>> - network.services.lb.total.connections >>>>>>> >>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>> supported in neutron. >>>>>>> >>>>>>> I found >>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>> but this is not available in stein or train. >>>>>>> >>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>> octavia. >>>>>>> I mostly want for start to just meter if a loadbalancer was deployed >>>>>>> and has status active. >>>>>>> >>>>>> >>>>>> You can get the provisioning and operating status of Octavia load >>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>> three API endpoints for statistics [2][3][4]. >>>>>> >>>>>> I hope this helps with your use case. >>>>>> >>>>>> Cheers, >>>>>> Carlos >>>>>> >>>>>> [1] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>> [2] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>> [3] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>> [4] >>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jul 3 16:32:55 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 3 Jul 2020 16:32:55 +0000 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: References: Message-ID: <20200703163255.rlotrtjbwjlxwt4o@yuggoth.org> On 2020-07-03 14:34:28 +0200 (+0200), info at dantalion.nl wrote: [...] > I understand, this is unfortunate however as when I previously > asked I was told that it could be achieved both using loci or > openstack-helm-images. Seeing how the loci patch took around 7 > months to merge I have now faced quite some delays. > > I will submit the patch to openstack-helm-images soon, thanks for > clarifying. [...] Be aware that the loci team basically dissolved a year or two back and development mostly ground to a halt. The loci deliverable was folded into the openstack-helm team a couple months ago because they still depend on it in some places, so they've committed to keep it on life support for now. At this point I would assume whatever the openstack-helm team is focusing on will receive better support than loci, unless they're actively working to use loci more. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ionut at fleio.com Fri Jul 3 16:59:37 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 3 Jul 2020 19:59:37 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi, I just noticed that the example dynamic.network.services.vpn.connection from https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has the wrong indentation. This https://paste.xinu.at/6PTfsM/ is loaded without any error. Now I have to see why is not polling from it On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: > Hi Rafael, > > I think I applied all the reviews successfully but I tried to do an > octavia dynamic poller but I have couples of errors. > > Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ > Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ > > if i remove the - in front of name like this: > https://paste.xinu.at/K7s5I8/ > The error is different this time: https://paste.xinu.at/zWdC0U/ > > Is there something I missed or is something wrong in yaml? > > > On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> >> Since the merging window for ussuri was long passed for those commits, is >>> it safe to assume that it will not land in stable/ussuri at all and those >>> will be available for victoria? >>> >> >> I would say so. We are lacking people to review and then merge it. >> >> How safe is to cherry pick those commits and use them in production? >>> >> As long as the person executing the cherry-picks, and maintaining the >> code knows what she/he is doing, you should be safe. The guys that are >> using this implementation (and others that I and my colleagues proposed), >> have a few openstack components that are customized with the >> patches/enhancements/extensions we developed so far; this means, they are >> not using the community version, but something in-between (the community >> releases + the patches we did). Of course, it is only possible, because we >> are the ones creating and maintaining these codes; therefore, we can assure >> quality for production. >> >> >> >> >> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >> >>> Hello Rafael, >>> >>> Since the merging window for ussuri was long passed for those commits, >>> is it safe to assume that it will not land in stable/ussuri at all and >>> those will be available for victoria? >>> >>> How safe is to cherry pick those commits and use them in production? >>> >>> >>> >>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>> However, there are some important PRs still waiting for a merge, that might >>>> be important for your use case: >>>> * https://review.opendev.org/#/c/722092/ >>>> * https://review.opendev.org/#/c/715180/ >>>> * https://review.opendev.org/#/c/715289/ >>>> * https://review.opendev.org/#/c/679999/ >>>> * https://review.opendev.org/#/c/709807/ >>>> >>>> >>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves >>>> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I want to meter the loadbalancer into gnocchi for billing purposes in >>>>>> stein/train and ceilometer doesn't support dynamic pollsters. >>>>>> >>>>> >>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>> >>>>> >>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>> >>>>> >>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>>> Ceilometer project. >>>>> >>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>> cgoncalves at redhat.com> wrote: >>>>>> >>>>>>> Hi Ionut, >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello guys, >>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>> the following: >>>>>>>> - network.services.lb.active.connections >>>>>>>> - network.services.lb.health_monitor >>>>>>>> - network.services.lb.incoming.bytes >>>>>>>> - network.services.lb.listener >>>>>>>> - network.services.lb.loadbalancer >>>>>>>> - network.services.lb.member >>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>> - network.services.lb.pool >>>>>>>> - network.services.lb.total.connections >>>>>>>> >>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>> supported in neutron. >>>>>>>> >>>>>>>> I found >>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>> but this is not available in stein or train. >>>>>>>> >>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>> octavia. >>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>> deployed and has status active. >>>>>>>> >>>>>>> >>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>> >>>>>>> I hope this helps with your use case. >>>>>>> >>>>>>> Cheers, >>>>>>> Carlos >>>>>>> >>>>>>> [1] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>> [2] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>> [3] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>> [4] >>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Fri Jul 3 19:13:04 2020 From: cboylan at sapwetik.org (Clark Boylan) Date: Fri, 03 Jul 2020 12:13:04 -0700 Subject: Setuptools 48 and Devstack Failures Message-ID: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> Hello, Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. Clark From rafaelweingartner at gmail.com Fri Jul 3 22:09:40 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 3 Jul 2020 19:09:40 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Good catch. I fixed the docs. https://review.opendev.org/#/c/739288/ On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: > Hi, > > I just noticed that the example dynamic.network.services.vpn.connection > from > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has > the wrong indentation. > This https://paste.xinu.at/6PTfsM/ is loaded without any error. > > Now I have to see why is not polling from it > > On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: > >> Hi Rafael, >> >> I think I applied all the reviews successfully but I tried to do an >> octavia dynamic poller but I have couples of errors. >> >> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >> >> if i remove the - in front of name like this: >> https://paste.xinu.at/K7s5I8/ >> The error is different this time: https://paste.xinu.at/zWdC0U/ >> >> Is there something I missed or is something wrong in yaml? >> >> >> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> >>> Since the merging window for ussuri was long passed for those commits, >>>> is it safe to assume that it will not land in stable/ussuri at all and >>>> those will be available for victoria? >>>> >>> >>> I would say so. We are lacking people to review and then merge it. >>> >>> How safe is to cherry pick those commits and use them in production? >>>> >>> As long as the person executing the cherry-picks, and maintaining the >>> code knows what she/he is doing, you should be safe. The guys that are >>> using this implementation (and others that I and my colleagues proposed), >>> have a few openstack components that are customized with the >>> patches/enhancements/extensions we developed so far; this means, they are >>> not using the community version, but something in-between (the community >>> releases + the patches we did). Of course, it is only possible, because we >>> are the ones creating and maintaining these codes; therefore, we can assure >>> quality for production. >>> >>> >>> >>> >>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>> >>>> Hello Rafael, >>>> >>>> Since the merging window for ussuri was long passed for those commits, >>>> is it safe to assume that it will not land in stable/ussuri at all and >>>> those will be available for victoria? >>>> >>>> How safe is to cherry pick those commits and use them in production? >>>> >>>> >>>> >>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>> However, there are some important PRs still waiting for a merge, that might >>>>> be important for your use case: >>>>> * https://review.opendev.org/#/c/722092/ >>>>> * https://review.opendev.org/#/c/715180/ >>>>> * https://review.opendev.org/#/c/715289/ >>>>> * https://review.opendev.org/#/c/679999/ >>>>> * https://review.opendev.org/#/c/709807/ >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>> cgoncalves at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>> >>>>>> >>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>> >>>>>> >>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>> >>>>>> >>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to the >>>>>> Ceilometer project. >>>>>> >>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>> cgoncalves at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Ionut, >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello guys, >>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>> the following: >>>>>>>>> - network.services.lb.active.connections >>>>>>>>> - network.services.lb.health_monitor >>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>> - network.services.lb.listener >>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>> - network.services.lb.member >>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>> - network.services.lb.pool >>>>>>>>> - network.services.lb.total.connections >>>>>>>>> >>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>> supported in neutron. >>>>>>>>> >>>>>>>>> I found >>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>> but this is not available in stein or train. >>>>>>>>> >>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>> octavia. >>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>> deployed and has status active. >>>>>>>>> >>>>>>>> >>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>>> >>>>>>>> I hope this helps with your use case. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Carlos >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>> [2] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>> [3] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>> [4] >>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Jul 3 22:29:18 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 03 Jul 2020 17:29:18 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> Message-ID: <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > Hello, > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 This is patch to use old setuptools- - https://review.opendev.org/#/c/739290/ > > Clark > > From gmann at ghanshyammann.com Sun Jul 5 01:24:55 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 04 Jul 2020 20:24:55 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> Message-ID: <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> ---- On Fri, 03 Jul 2020 17:29:18 -0500 Ghanshyam Mann wrote ---- > ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > > Hello, > > > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. > > Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. > As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track > it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 > > This is patch to use old setuptools- > - https://review.opendev.org/#/c/739290/ Updates: Issue is when setuptools adopts distutils from the standard library (in 48.0.0) and uses it, downstream packagers customization to distutils will be lost. - https://github.com/pypa/setuptools/issues/2232 setuptools 49.1.0 reverted the adoption of distutils from the standard library and its working now. I have closed the devstack bug 1886237 and proposed the revert of capping of setuptools by blacklisting 48.0.0 and 49.0.0 so that we test with latest setuptools. For now, devstack will pick the 49.1.0 and pass. - https://review.opendev.org/#/c/739294/2 In summary, gate is green and you can recheck on the failed patches. -gmann > > > > > Clark > > > > > > From gmann at ghanshyammann.com Sun Jul 5 18:36:48 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 05 Jul 2020 13:36:48 -0500 Subject: [all][tc][goals] Migrate CI/CD jobs to new Ubuntu LTS Focal: Week R-15 Update Message-ID: <17320443796.124ad0a06441329.7461862692885839223@ghanshyammann.com> Hello Everyone, Please find the week R-15 updates on 'Ubuntu Focal migration' community goal. Tracking: https://storyboard.openstack.org/#!/story/2007865 Progress: ======= * I have prepared the patched to migrate the unit/functional/doc/cover tox jobs to focal which are WIP till we finish the project side testing. This and its base patches - https://review.opendev.org/#/c/738328/ * devstack and tempest base patches are changed with Depends-On on 738328 tox job patch. This way we can test the complete gate (integration + unit +functional + doc + cover +pep8 + lower-constraint) jobs with a single testing patch by doing Depends-On: https://review.opendev.org/#/c/734700/ (or devstack base patch or tox one if you do not have tempest jobs to test) * I have started a few more project testing and found bugs on the incompatible deps versions for Focal. Please refer to the 'Bugs Report' section for details. Bugs Report: ========== Summary: Total 4 (1 fixed, 3 in-progress). 1. Bug#1882521. (IN-PROGRESS) There is open bug for nova/cinder where three tempest tests are failing for volume detach operation. There is no clear root cause found yet -https://bugs.launchpad.net/cinder/+bug/1882521 We have skipped the tests in tempest base patch to proceed with the other projects testing but this is blocking things for the migration. 2. We encountered the nodeset name conflict with x/tobiko. (FIXED) nodeset conflict is resolved now and devstack provides all focal nodes now. 3. Bug#1886296. (IN-PROGRESS) pyflakes till 2.1.0 is not compatible with python 3.8 which is the default python version on ubuntu focal[1]. With pep8 job running on focal faces the issue and fail. We need to bump the pyflakes to 2.1.1 as min version to run pep8 jobs on py3.8. As of now, many projects are using old hacking version so I am explicitly adding pyflakes>=2.1.1 on the project side[2] but for the long term easy maintenance, I am doing it in 'hacking' requirements.txt[3] nd will release a new hacking version. After that project can move to new hacking and do not need to maintain pyflakes version compatibility. 4. Bug#1886298. (IN-PROGRESS) 'Markupsafe' 1.0 is not compatible with the latest version of setuptools[4], We need to bump the lower-constraint for Markupsafe to 1.1.1 to make it work. There are a few more issues[5] with lower-constraint jobs which I am debugging. What work to be done on the project side: ================================ This goal is more of testing the jobs on focal and fixing bugs if any otherwise migrate jobs by switching the nodeset to focal node sets defined in devstack. 1. Start a patch in your repo by making depends-on on either of below: devstack base patch if you are using only devstack base jobs not tempest: https://review.opendev.org/#/c/731207/ OR tempest base patch if you are using the tempest base job (like devstack-tempest): https://review.opendev.org/#/c/734700/ Example: https://review.opendev.org/#/c/738126/ 2. If none of your project jobs override the nodeset then above patch will be testing patch(do not merge) otherwise change the nodeset to focal. Example: https://review.opendev.org/#/c/737370/ 3. If the jobs are defined in branchless repo and override the nodeset then you need to override the branches variant to adjust the nodeset so that those jobs run on Focal on victoria onwards only. If no nodeset is overridden then devstack being branched and stable base job using bionic/xenial will take care of this. Once we finish the testing on projects side and no failure then we will merge the devstack and tempest base patches. Important things to note: =================== * Do not forgot to add the story and task link to your patch so that we can track it smoothly. * Use gerrit topic 'migrate-to-focal' * Do not backport any of the patches. References: ========= Goal doc: https://governance.openstack.org/tc/goals/selected/victoria/migrate-ci-cd-jobs-to-ubuntu-focal.html Storyboard tracking: https://storyboard.openstack.org/#!/story/2007865 [1] https://github.com/PyCQA/pyflakes/issues/367 [2] https://review.opendev.org/#/c/739315/ [3] https://review.opendev.org/#/c/739334/ [4] https://github.com/pallets/markupsafe/issues/116 [5] https://zuul.opendev.org/t/openstack/build/7ecd9cf100194bc99b3b70fa1e6de032 -gmann From hongbin034 at gmail.com Sun Jul 5 19:47:58 2020 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sun, 5 Jul 2020 15:47:58 -0400 Subject: [Neutron] Bug Deputy Report (June 29 - July 05) Message-ID: Hi all, Below is the bug deputy report for last week. Critical: * https://bugs.launchpad.net/neutron/+bug/1885900 test_trunk_subport_lifecycle is failing in ovn based jobs * https://bugs.launchpad.net/neutron/+bug/1885899 test_qos_basic_and_update test is failing High: * https://bugs.launchpad.net/neutron/+bug/1886116 slaac no longer works on IPv6 tenant subnets * https://bugs.launchpad.net/neutron/+bug/1885898 test connectivity through 2 routers fails in neutron-ovn-tempest-full-multinode-ovs-master job * https://bugs.launchpad.net/neutron/+bug/1885897 Tempest test_create_router_set_gateway_with_fixed_ip test is failing often in dvr scenario job * https://bugs.launchpad.net/neutron/+bug/1885695 [OVS] "vsctl" implementation does not allow empty transactions Medium: * https://bugs.launchpad.net/neutron/+bug/1885891 DB exception when updating a "ml2_port_bindings" object * https://bugs.launchpad.net/neutron/+bug/1885758 RPCMessage timeouts when ovs agent is reporting status about many ports Low: * https://bugs.launchpad.net/neutron/+bug/1886216 keepalived-state-change does not format correctly the logs * https://bugs.launchpad.net/neutron/+bug/1885547 [fullstack] OVS interface events isolation error with more than one OVS agent RFE: * https://bugs.launchpad.net/neutron/+bug/1885921 [RFE][floatingip port_forwarding] Add port ranges -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangbailin at inspur.com Mon Jul 6 02:14:41 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Mon, 6 Jul 2020 02:14:41 +0000 Subject: =?utf-8?B?562U5aSNOiBbbGlzdHMub3BlbnN0YWNrLm9yZ+S7o+WPkV1bY3lib3JnXSBJ?= =?utf-8?Q?ncomplete_v2_API_in_Train?= In-Reply-To: <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> References: <68eefdd8dbf1a67a74233c0d02e5b6d8@sslemail.net> <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Message-ID: Ildik, Cyborg officially completed the V2 version switch from the Ussuri version [1][2], and introduced microversion, you can refer to [3] or more information about the latest Cyborg V2 API. Sorry, we did not backport Device & Deployable V2 API to Train version. [1]https://specs.openstack.org/openstack/cyborg-specs/specs/ussuri/approved/cyborg-api.html [2]https://review.opendev.org/#/c/695648/, https://review.opendev.org/#/c/712835/ [3]https://docs.openstack.org/api-ref/accelerator/v2/index.html -----邮件原件----- 发件人: Ildiko Vancsa [mailto:ildiko.vancsa at gmail.com] 发送时间: 2020年7月3日 22:53 收件人: OpenStack Discuss 主题: [lists.openstack.org代发][cyborg] Incomplete v2 API in Train Hi Cyborg Team, I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? Thanks, Ildikó [1] https://www.lfnetworking.org/about/cntt/ [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 From yumeng_bao at yahoo.com Mon Jul 6 03:09:08 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Mon, 6 Jul 2020 11:09:08 +0800 Subject: [cyborg] Incomplete v2 API in Train References: <41297731-AD04-4A0E-9E21-56DD3FF90885.ref@yahoo.com> Message-ID: <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com>  Hi Ildikó, > Hi Cyborg Team, > I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration > resources. > We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct > the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. Yes,your understanding is correct,the v2 API implementation in Train is partial. I would update the documentation soon. I would recommend you to use the stable/ussuri version(instead of train release) of cyborg for two reasons: 1) API V2 in ussuri is complete while that of train is incomplete 2)the nova-cyborg integration[3] was not landed until Ussuri[4],so the integration in Train is also partial[5]. So if CNTT wants the complete accelerator management function,it would be better to use cyborg ussuri. > The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be > possible to bring this up and discuss on an upcoming Cyborg team meeting? Yes, sure. We can bring this up on the next weekly meeting on this Thursday 03:00 UTC at #openstack-cyborg, I have added this to meeting agenda[6]. > Thanks, > Ildikó > [1] https://www.lfnetworking.org/about/cntt/ > [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 [3]https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/nova-cyborg-interaction.html [4]https://releases.openstack.org/ussuri/highlights.html#cyborg [5]https://releases.openstack.org/train/highlights.html#cyborg [6]https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda Regards, Yumeng From ildiko.vancsa at gmail.com Mon Jul 6 06:30:26 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 6 Jul 2020 08:30:26 +0200 Subject: [cyborg] Incomplete v2 API in Train In-Reply-To: <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com> References: <41297731-AD04-4A0E-9E21-56DD3FF90885.ref@yahoo.com> <41297731-AD04-4A0E-9E21-56DD3FF90885@yahoo.com> Message-ID: <99AD36C6-98D1-4E2F-A811-002EF14D6944@gmail.com> Hi Yumeng, Thank you for the information. I will also attend the meeting this Thursday to get a full understanding and plans to work together with CNTT on acceleration management. Thanks, Ildikó > On Jul 6, 2020, at 05:09, yumeng bao wrote: > > > Hi Ildikó, > >> Hi Cyborg Team, > >> I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration > resources. > >> We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct > the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. > > Yes,your understanding is correct,the v2 API implementation in Train is partial. I would update the documentation soon. > I would recommend you to use the stable/ussuri version(instead of train release) of cyborg for two reasons: 1) API V2 in ussuri is complete while that of train is incomplete 2)the nova-cyborg integration[3] was not landed until Ussuri[4],so the integration in Train is also partial[5]. So if CNTT wants the complete accelerator management function,it would be better to use cyborg ussuri. > >> The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be > possible to bring this up and discuss on an upcoming Cyborg team meeting? > > Yes, sure. We can bring this up on the next weekly meeting on this Thursday 03:00 UTC at #openstack-cyborg, I have added this to meeting agenda[6]. > >> Thanks, >> Ildikó > >> [1] https://www.lfnetworking.org/about/cntt/ >> [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 > > > > [3]https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/nova-cyborg-interaction.html > [4]https://releases.openstack.org/ussuri/highlights.html#cyborg > [5]https://releases.openstack.org/train/highlights.html#cyborg > [6]https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda > > > Regards, > Yumeng > From ildiko.vancsa at gmail.com Mon Jul 6 06:31:11 2020 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Mon, 6 Jul 2020 08:31:11 +0200 Subject: =?utf-8?B?UmU6IFtsaXN0cy5vcGVuc3RhY2sub3Jn5Luj5Y+RXVtjeWJvcmdd?= =?utf-8?B?IEluY29tcGxldGUgdjIgQVBJIGluIFRyYWlu?= In-Reply-To: References: <68eefdd8dbf1a67a74233c0d02e5b6d8@sslemail.net> <79086EC5-4C79-4476-9AE9-579F99CBA1B2@gmail.com> Message-ID: Hi Brin, Thank you for the information, I will read through the links you provided and get back if I have more questions. Thanks, Ildikó > On Jul 6, 2020, at 04:14, Brin Zhang(张百林) wrote: > > Ildik, > > Cyborg officially completed the V2 version switch from the Ussuri version [1][2], and introduced microversion, you can refer to [3] or more information about the latest Cyborg V2 API. Sorry, we did not backport Device & > Deployable V2 API to Train version. > > [1]https://specs.openstack.org/openstack/cyborg-specs/specs/ussuri/approved/cyborg-api.html > [2]https://review.opendev.org/#/c/695648/, https://review.opendev.org/#/c/712835/ > [3]https://docs.openstack.org/api-ref/accelerator/v2/index.html > > -----邮件原件----- > 发件人: Ildiko Vancsa [mailto:ildiko.vancsa at gmail.com] > 发送时间: 2020年7月3日 22:53 > 收件人: OpenStack Discuss > 主题: [lists.openstack.org代发][cyborg] Incomplete v2 API in Train > > Hi Cyborg Team, > > I’m working with the CNTT community[1], they are working on building reference architecture for telecom workloads. Cyborg is important for their work to be able to utilize hardware acceleration resources. > > We are planning to use the Train version of OpenStack projects including Cyborg and it would be great to be able to switch to the v2 API as v1 is deprecated now. If my understanding is correct the v2 API implementation in Train is partial, but the documentation[2] doesn’t give accurate view about what is included. > > The CNTT team would like to be able to integrate and access the whole v2 API if that is possible. It would be great to discuss the options that we could use on the way forward. Would it be possible to bring this up and discuss on an upcoming Cyborg team meeting? > > Thanks, > Ildikó > > [1] https://www.lfnetworking.org/about/cntt/ > [2] https://docs.openstack.org/cyborg/train/api/api.html#v2-0 > > > From katonalala at gmail.com Mon Jul 6 07:11:32 2020 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 6 Jul 2020 09:11:32 +0200 Subject: [All][Neutron] Migrate old DB migration versions to init ops In-Reply-To: References: Message-ID: Hi, Exactly, it is not allowed to backport such a change to rocky for example, so on older branches the migration scripts will be there as I see, and you can upgrade to a release which support migration to Victoria for example. Regards lajoskatona Akihiro Motoki ezt írta (időpont: 2020. júl. 3., P, 15:39): > On Thu, Jul 2, 2020 at 10:37 PM Ruby Loo wrote: > > > > Hi, > > > > On Tue, Jun 30, 2020 at 10:53 PM Akihiro Motoki > wrote: > >> > >> On Tue, Jun 30, 2020 at 9:01 PM Lajos Katona > wrote: > >> > > >> > Hi, > >> > Simplification sounds good (I do not take into considerations like > "no code fanatic movements" or similar). > >> > How this could affect upgrade, I am sure there are deployments older > than pike, and those at a point will > >> > got for some newer version (I hope we can give them good answers for > their problems as Openstack) > >> > > >> > What do you think about stadium projects? As those have much less > activity (as mostly solve one rather specific problem), > >> > and much less migration scripts shall we just "merge" those to init > ops? > >> > I checked quickly a few stadium project and only bgpvpn has newer > migration scripts than pike. > >> > >> In my understanding, squashing migrations can be done repository by > repository. > >> A revision hash of each migration is not changed and head revisions > >> are stored in the database per repository, so it should work. > >> For initial deployments, neutron-db-manage runs all db migrations from > >> the initial revision to a specified revision (release), so it has no > >> problem. > >> For upgrade scenarios, this change just means that we just dropped > >> support upgrade from releases included in squashed migrations. > >> For example, if we squash migrations up to rocky (and create > >> rocky_initial migration) in the neutron repo, we no longer support db > >> migration from releases before rocky. This would be the only > >> difference I see. > > > > > > > > I wonder if this is acceptable (that an OpenStack service will not > support db migrations prior to rocky). What is (or is there?) OpenStack's > stance wrt support for upgrades? We are using ocata and plan on upgrading > but we don't know when that might happen :-( > > > > --ruby > > It is not true. What we the upstream community recommend is to upgrade > the controller node and databases in the fast-foward upgrade manner. > Even if the upstream repository just provides database migration from > for example Rocky, you can upgrade from a release older than rocky, by > upgrading one release by one. > In addition, by keeping a specific number of releases in db > migrations, operators can still upgrade from more than one old release > (if they want). > > --amotoki > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Mon Jul 6 07:46:24 2020 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Mon, 6 Jul 2020 09:46:24 +0200 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Hi all, Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. Dmitry On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > Hi all, > > Since we're switching to 6 releases per year cadence, I think it makes > sense to have short virtual meetups after every release. The goal will be > to sync on priorities, exchange ideas and define plans for the upcoming 2 > months of development. Fooling around is also welcome! > > Please vote for the best 2 hours slot next week: > https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more > potential time zones, so apologies for so many options. Please cast your > vote until Friday, 12pm UTC, so that I can announce the final time slot > this week. > > Dmitry > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Mon Jul 6 08:46:13 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Mon, 6 Jul 2020 10:46:13 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: Hi Alfredo, since you mentioned, it is not essential to have that opstool, so I have replaced it with "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so now it is: "default": { "oschecks_package": "sysstat" } And you are absolutely right regarding delorean, it took only OSP packages from that, and kvm and libvirt are at your specified versions. And then I believe, I found case for failing VM: "6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check" so, my question is: I have only control (pxe) network, which is distributed between sites and OSP is having only one network (ControlPlane). How my controller and compute network should look like? My controller network looks like [1] and compute like [2]. When I uncomment in compute br-provider part, it do not deploy. does br-provider networks MUST be interconnectable? I would need to have the possibility with the local network (vxlan) to communicate between instances within the cloud, and external connectivity would be done using provider vlan. each provider VLAN will be used only on one compute node. is it possible? [0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package list in libvirt container [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso wrote: > > > On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis > wrote: > >> it is, i have image build failing. i can modify yaml used to create >> image. can you remind me which files it would be? >> >> > Right, I see that the patch must not be working fine for centos and the > package is being installed from delorean repos in the log. I guess it > needs an entry to cover the centos 8 case (i'm checking with opstools > maintainer). > > As workaround I'd propose you to use the package from: > > > https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ > > or alternatively applying some local patch to tripleo-puppet-elements. > > >> and your question, "how it can impact kvm": >> >> in image most of the packages get deployed from deloren repos. I believe >> part is from centos repos and part of whole packages in >> overcloud-full.qcow2 are from deloren. so it might have bit different minor >> version, that might be incompactible... at least it have happend for me >> previously with train release so i used tested ci fully from the >> beginning... >> I might be for sure wrong. >> > > Delorean repos contain only OpenStack packages, things like nova, etc... > not kvm or things included in CentOS repos. KVM will always installed which > should be installed from "Advanced Virtualization" repository. May you > check what versions of qemu-kvm and libvirt you got installed into the > overcloud-full image?, it should match with the versions in: > > > http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ > > like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm > > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayadityagupta11 at gmail.com Mon Jul 6 08:51:41 2020 From: jayadityagupta11 at gmail.com (jayaditya gupta) Date: Mon, 6 Jul 2020 10:51:41 +0200 Subject: [python-openstackclient] microversion support Message-ID: Hi , we discussed the microversion support in PTG meeting .I would like to get started with it. How can I help with this? Currently OpenStack CLI is defaulting to nova api V2, we want to change it so it takes latest version. for issue reference see this : https://storyboard.openstack.org/#!/story/2007727 Best Regards Jayaditya Gupta -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Mon Jul 6 08:59:37 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Mon, 6 Jul 2020 10:59:37 +0200 Subject: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor In-Reply-To: References: Message-ID: I have created a network with geneve and it worked. Previous network which it used by default was vlan. First of all, thank you Arkady for LogTool ;) Second, how to modify my config, to have VLAN working? NeutronNetworkType: 'vlan,geneve' NeutronTunnelTypes: 'vxlan' NeutronBridgeMappings: 'default:br-provider' NeutronGlobalPhysnetMtu: 1500 NeutronBridgeMappings: datacentre:br-ex NeutronExternalNetworkBridge: 'br-ex' my compute network layout. [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config [3] http://paste.openstack.org/show/795564/ # ip a s from compute On Mon, 6 Jul 2020 at 10:46, Ruslanas Gžibovskis wrote: > Hi Alfredo, > > since you mentioned, it is not essential to have that opstool, so I have > replaced it with > "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so > now it is: > "default": { > "oschecks_package": "sysstat" > } > > And you are absolutely right regarding delorean, it took only OSP packages > from that, and kvm and libvirt are at your specified versions. > > And then I believe, I found case for failing VM: > > "6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: > nova.exception.PortBindingFailed: Binding failed for port > af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check" > > so, my question is: > I have only control (pxe) network, which is distributed between sites and > OSP is having only one network (ControlPlane). How my controller and > compute network should look like? > My controller network looks like [1] and compute like [2]. When I > uncomment in compute br-provider part, it do not deploy. > does br-provider networks MUST be interconnectable? > > I would need to have the possibility with the local network (vxlan) to > communicate between instances within the cloud, and external connectivity > would be done using provider vlan. each provider VLAN will be used only on > one compute node. is it possible? > > > [0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package > list in libvirt container > [1] http://paste.openstack.org/show/795562/ # controller net-config > [2] http://paste.openstack.org/show/795563/ @ compute net-config > > On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso > wrote: > >> >> >> On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis >> wrote: >> >>> it is, i have image build failing. i can modify yaml used to create >>> image. can you remind me which files it would be? >>> >>> >> Right, I see that the patch must not be working fine for centos and the >> package is being installed from delorean repos in the log. I guess it >> needs an entry to cover the centos 8 case (i'm checking with opstools >> maintainer). >> >> As workaround I'd propose you to use the package from: >> >> >> https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripleo/ >> >> or alternatively applying some local patch to tripleo-puppet-elements. >> >> >>> and your question, "how it can impact kvm": >>> >>> in image most of the packages get deployed from deloren repos. I believe >>> part is from centos repos and part of whole packages in >>> overcloud-full.qcow2 are from deloren. so it might have bit different minor >>> version, that might be incompactible... at least it have happend for me >>> previously with train release so i used tested ci fully from the >>> beginning... >>> I might be for sure wrong. >>> >> >> Delorean repos contain only OpenStack packages, things like nova, etc... >> not kvm or things included in CentOS repos. KVM will always installed which >> should be installed from "Advanced Virtualization" repository. May you >> check what versions of qemu-kvm and libvirt you got installed into the >> overcloud-full image?, it should match with the versions in: >> >> >> http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packages/q/ >> >> like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm >> >> >>> >>> -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jul 6 09:13:02 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 6 Jul 2020 11:13:02 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> Hi, Bump. Anyone has got any thoughts about it? > On 23 Jun 2020, at 12:24, Slawek Kaplonski wrote: > > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Principal software engineer Red Hat From thierry at openstack.org Mon Jul 6 09:19:23 2020 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 6 Jul 2020 11:19:23 +0200 Subject: [largescale-sig] Next meeting: July 8, 8utc Message-ID: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> Hi everyone, Hot on the heels of the OpenDev event on Large scale deployments, the Large Scale SIG will have a meeting this week on Wednesday, July 8 at 8 UTC[1] in the #openstack-meeting-3 channel on IRC: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200708T08 Feel free to add topics to our agenda at: https://etherpad.openstack.org/p/large-scale-sig-meeting A reminder of the TODOs we had from last meeting, in case you have time to make progress on them: - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - ttx to produce a draft of the 10th birthday slide for the SIG Talk to you all on Wednesday, -- Thierry Carrez From tobias.urdin at binero.com Mon Jul 6 09:49:01 2020 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 6 Jul 2020 09:49:01 +0000 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <1594028941528.18866@binero.com> Hello Slawek, This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and are looking forward into offloading a lot of work to OVN in the future. We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. >From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind the scenes in order to get to the desired point. * Geneve offloading on NIC, we might need new NICs or new firmware. * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. Best regards Tobias ________________________________________ From: Slawek Kaplonski Sent: Tuesday, June 23, 2020 12:24 PM To: OpenStack Discuss ML Cc: Assaf Muller; Daniel Alvarez Sanchez Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend Hi, The Neutron team wants to propose a switch of the default Neutron backend in Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to OVN with its own ovn-metadata-agent and ovn-controller. We discussed that change during the virtual PTG - see [1]. In this document we want to explain reasons why we want to do that change. OVN in 75 Words --------------- Open Virtual Network is managed under the OVS project, and was created by the original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, using lessons learned throughout the years. It is intended to be used in projects such as OpenStack and Kubernetes. OVN has a different architecture, moving us away from Python agents communicating with the Neutron API service via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. Here’s a heap of information about OpenStack’s integration of OVN: * OpenStack Boston Summit talk on OVN [2] * Upstream OpenStack networking-ovn documentation [3] and [4] * OSP 13 OVN documentation, including how to install it using Director [5] Neutron OVN driver was developed as a Neutron stadium project, "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main Neutron repository. Why? ---- In the Neutron team we believe that OVN and the Neutron OVN driver are built with a modern architecture that offers better foundations for a simpler and more performant solution. We see increased participation in kubernetes-ovn, resulting in a larger core OVN community, and we would like OpenStack to benefit from this Kubernetes driven OVN investment. Neutron OVN driver currently has got some feature parity gaps comparing to ML2/OVS (see [6] for details) but our team is working hard to close those gaps and we believe that this driver is the future for Neutron and that’s why we want to make it the default Neutron ML2 backend in the Devstack configuration. What Does it Mean? ------------------ Since most Openstack projects use Neutron in their CI and gate jobs, this change has the potential for a large impact. But this backend is already tested with various jobs in the Neutron CI and it works fine. Recently (See [7]) we also proposed to add an OVN based job to the Devstack’s check queue. Similarly the default Neutron backend in TripleO was changed in the Stein cycle and there were no any significant issues related strictly to this change. It worked well for other projects. Of course in the Neutron project we will be still gating other drivers, like ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of some of the jobs. The Neutron team is *NOT* going to deprecate any of the other existing ML2 drivers. We will be still maintaining Linuxbridge, OVS and other in-tree drivers in the same way as it is now. Action Plan ----------- We want to make this change before the Victoria-2 milestone to not make such changes too late in the release cycle. Our action plan is as below: 1. Share the plan and get feedback from the upstream community (this thread) 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to Devstack repo - we don’t want to force everyone else to add “enable_plugin neutron” in their local.conf file to use default Neutron backend, 3. Switch default Neutron backend in Devstack to be OVN, a. Switch definition of base devstack CI jobs that it will run Neutron with OVN backend, 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack projects to check if it will not break anything in the gate of those projects. 5. If all will be running fine, merge patches proposed in points 3 and 3a. [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 [2] https://www.youtube.com/watch?v=sgc7myiX6ts [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html [4] https://docs.openstack.org/neutron/latest/ovn/index.html [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html [7] https://review.opendev.org/#/c/736021/ -- Slawek Kaplonski Senior software engineer Red Hat From radoslaw.piliszek at gmail.com Mon Jul 6 10:10:50 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jul 2020 12:10:50 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <88D19264-9611-4D44-9F78-02B5E56AFD7E@redhat.com> Message-ID: On Mon, Jul 6, 2020 at 11:15 AM Slawek Kaplonski wrote: > > Hi, > > Bump. Anyone has got any thoughts about it? +2, happy to stress OVN OpenStack-wise. :-) -yoctozepto From lyarwood at redhat.com Mon Jul 6 10:57:21 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 6 Jul 2020 11:57:21 +0100 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained Message-ID: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> Hello all, Following on from my recent mail about the stable/ocata branch of the openstack/nova project now being unmaintained [1] I'd also like to move the stable/pike [2] branch formally into this phase of maintenance [3]. Volunteers are welcome to step forward and attempt to move the branch back to the ``Extended Maintenance`` phase by proposing changes and fixing CI in the next 3 months, otherwise the branch will be marked as ``EOL`` [4]. Again hopefully this isn't taking anyone by surprise but please let me know if this is going to be an issue! Regards, [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From juliaashleykreger at gmail.com Mon Jul 6 13:12:57 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 6 Jul 2020 06:12:57 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings everyone! We'll use our meetpad[1]! -Julia [1]: https://meetpad.opendev.org/ironic On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > Hi all, > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > Dmitry > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: >> >> Hi all, >> >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! >> >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. >> >> Dmitry From akekane at redhat.com Mon Jul 6 13:39:24 2020 From: akekane at redhat.com (Abhishek Kekane) Date: Mon, 6 Jul 2020 19:09:24 +0530 Subject: [glance] Global Request ID issues in Glance In-Reply-To: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> References: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> Message-ID: Hi Markus, Thank you for detailed analysis. Both cases you pointed out are valid bugs. Could you please report this to launchpad? Thanks & Best Regards, Abhishek Kekane On Fri, Jun 26, 2020 at 6:33 PM Markus Hentsch wrote: > Hello everyone, > > while I was experimenting with the Global Request ID functionality of > OpenStack [1], I identified two issues in Glance related to this topic. > I have written my findings below and would appreciate it if you could > take a look and confirm whether those are intended behaviors or indeed > issues with the implementation. > > In case of the latter please advice me which bug tracker to report them > to. > > > 1. The Glance client does not correctly forward the global ID > > When the SessionClient class is used, the global_request_id is removed > from kwargs in the constructor using pop() [2]. Directly after this, > the parent constructor is called using super(), which in this case is > Adapter from the keystoneauth1 library. Therein the global_request_id > is set again [3] but since it has been removed from the kwargs, it > defaults to None as specified in the Adapter's __init__() header. Thus, > the global_request_id passed to the SessionClient constructor never > actually makes it to the Glance API. This is in contrast to the > HTTPClient class, where get() is used instead of pop() [4]. > > This can be reproduced simply by creating a server in Nova from an > image in Glance, which will attempt to create the Glance client > instance using the global_request_id [5]. Passing the > "X-Openstack-Request-Id" header during the initial API call for the > server creation, makes it visible in Nova (using a suitable > "logging_context_format_string" setting) but it's not visible in > Glance. Using a Python debugger shows Glance generating a new local ID > instead. > > > 2. Glance interprets global ID as local one for Oslo Context objects > > While observing the Glance log file, I observed Glance always logging > the global_request_id instead of a local one if it is available. > > Using "%(global_request_id)s" within "logging_context_format_string"[6] > in the glance-api.conf will always print "None" in the logs whereas > "%(request_id)s" will either be an ID generated by Glance if no global > ID is available or the received global ID. > > Culprit seems to be the context middleware of Glance where the global > ID in form of the "X-Openstack-Request-Id" header is parsed from the > request and passed as "request_id" instead of "global_request_id" to > the "glance.context.RequestContext.from_environ()" call [7]. > > This is in contrast to other services such as Nova or Neutron where > the two variables actually print the values according to their name > (request_id always being the local one, whereas global_request_id is > the global one or None). > > > [1] > > https://specs.openstack.org/openstack/oslo-specs/specs/pike/global-req-id.html > [2] > > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L355 > [3] > > https://github.com/openstack/keystoneauth/blob/dab8e1057ae8bb9a0e778fb8d3141ad4fb36a339/keystoneauth1/adapter.py#L166 > [4] > > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L162 > [5] > > https://github.com/openstack/nova/blob/1cae0cd7229207478b70275509aecd778ca69225/nova/image/glance.py#L78 > [6] > > https://docs.openstack.org/oslo.context/2.17.0/user/usage.html#context-variables > [7] > > https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/middleware/context.py#L201 > > > Thanks! > > Markus > > -- > Markus Hentsch > Team Leader > > secustack GmbH - Digital Sovereignty in the Cloud > https://www.secustack.com > Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden > District Court Dresden, Register Number: HRB 38890 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionut at fleio.com Mon Jul 6 14:17:10 2020 From: ionut at fleio.com (Ionut Biru) Date: Mon, 6 Jul 2020 17:17:10 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi Rafael, I have an error and I cannot resolve it myself. https://paste.xinu.at/LEfdXD/ Do you happen to know what's wrong? endpoint list https://paste.xinu.at/v3j1jl/ octavia.yaml https://paste.xinu.at/TIxfOz/ polling.yaml https://paste.xinu.at/oBEFj/ pipeline.yaml https://paste.xinu.at/qvEdTX/ On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Good catch. I fixed the docs. > https://review.opendev.org/#/c/739288/ > > On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: > >> Hi, >> >> I just noticed that the example dynamic.network.services.vpn.connection >> from >> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >> the wrong indentation. >> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >> >> Now I have to see why is not polling from it >> >> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >> >>> Hi Rafael, >>> >>> I think I applied all the reviews successfully but I tried to do an >>> octavia dynamic poller but I have couples of errors. >>> >>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>> >>> if i remove the - in front of name like this: >>> https://paste.xinu.at/K7s5I8/ >>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>> >>> Is there something I missed or is something wrong in yaml? >>> >>> >>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> >>>> Since the merging window for ussuri was long passed for those commits, >>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>> those will be available for victoria? >>>>> >>>> >>>> I would say so. We are lacking people to review and then merge it. >>>> >>>> How safe is to cherry pick those commits and use them in production? >>>>> >>>> As long as the person executing the cherry-picks, and maintaining the >>>> code knows what she/he is doing, you should be safe. The guys that are >>>> using this implementation (and others that I and my colleagues proposed), >>>> have a few openstack components that are customized with the >>>> patches/enhancements/extensions we developed so far; this means, they are >>>> not using the community version, but something in-between (the community >>>> releases + the patches we did). Of course, it is only possible, because we >>>> are the ones creating and maintaining these codes; therefore, we can assure >>>> quality for production. >>>> >>>> >>>> >>>> >>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>> >>>>> Hello Rafael, >>>>> >>>>> Since the merging window for ussuri was long passed for those commits, >>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>> those will be available for victoria? >>>>> >>>>> How safe is to cherry pick those commits and use them in production? >>>>> >>>>> >>>>> >>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>>> However, there are some important PRs still waiting for a merge, that might >>>>>> be important for your use case: >>>>>> * https://review.opendev.org/#/c/722092/ >>>>>> * https://review.opendev.org/#/c/715180/ >>>>>> * https://review.opendev.org/#/c/715289/ >>>>>> * https://review.opendev.org/#/c/679999/ >>>>>> * https://review.opendev.org/#/c/709807/ >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>> cgoncalves at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>> >>>>>>> >>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>> >>>>>>> >>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>> >>>>>>> >>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>> the Ceilometer project. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Ionut, >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello guys, >>>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>>> the following: >>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>> - network.services.lb.listener >>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>> - network.services.lb.member >>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>> - network.services.lb.pool >>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>> >>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>> supported in neutron. >>>>>>>>>> >>>>>>>>>> I found >>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>> but this is not available in stein or train. >>>>>>>>>> >>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>> octavia. >>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>> deployed and has status active. >>>>>>>>>> >>>>>>>>> >>>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>>> the full load balancer status tree [1]. Additionally, Octavia has >>>>>>>>> three API endpoints for statistics [2][3][4]. >>>>>>>>> >>>>>>>>> I hope this helps with your use case. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Carlos >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>> [2] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>> [3] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>> [4] >>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jul 6 14:57:27 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 6 Jul 2020 10:57:27 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Review - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - Update goal selection docs to clarify the goal count https://review.opendev.org/739150 - Add legacy repository validation https://review.opendev.org/737559 - Add "tc:approved-release" tag to manila https://review.opendev.org/738105 - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 ## Project Updates - Add deprecated cycle for deprecated deliverables https://review.opendev.org/737590 - No longer track refstack repos in governance https://review.opendev.org/737962 - Add Neutron Arista plugin charm to OpenStack charms https://review.opendev.org/737734 ## General Changes - Add links to chosen release names https://review.opendev.org/738867 - Add storyboard link to migrate-to-focal goal https://review.opendev.org/738129 - TC Guide Follow Ups https://review.opendev.org/737650 # Email Threads - New Office Hours: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015761.html - OSU Intern Work: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015760.html - Summit Programming Committee Nominations Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015756.html - Summit CFP Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015730.html # Other Reminders - OpenStack's 10th anniversary community meeting should be happening July 16th: more info coming soon! - If you're an operator, make sure you fill out our user survey: https://www.openstack.org/user-survey/survey-2020/ Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From elod.illes at est.tech Mon Jul 6 16:02:17 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Mon, 6 Jul 2020 18:02:17 +0200 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained In-Reply-To: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> References: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> Message-ID: <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> Just a heads-up that a devstack patch [1] addresses the issues in Pike. As soon as that is merging, stable/pike hopefully will be ready to accept fixes. I'll try to keep Pike working, but of course, anyone who is interested to help are welcome. :) [1] https://review.opendev.org/#/c/735616/ Thanks, Előd On 2020. 07. 06. 12:57, Lee Yarwood wrote: > Hello all, > > Following on from my recent mail about the stable/ocata branch of the > openstack/nova project now being unmaintained [1] I'd also like to move > the stable/pike [2] branch formally into this phase of maintenance [3]. > > Volunteers are welcome to step forward and attempt to move the branch > back to the ``Extended Maintenance`` phase by proposing changes and > fixing CI in the next 3 months, otherwise the branch will be marked as > ``EOL`` [4]. > > Again hopefully this isn't taking anyone by surprise but please let me > know if this is going to be an issue! > > Regards, > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike > [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained > [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life > From juliaashleykreger at gmail.com Mon Jul 6 16:15:16 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 6 Jul 2020 09:15:16 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings fellow humans! We had a great two hour session but we ran out of time to get back to the discussion of a capability/driver support matrix. We agreed we should have a call later in the week to dive back into the topic. I've created a doodle[1] for us to identify the best time for a hopefully quick 30 minute call to try and reach consensus. Thanks everyone! -Julia [1]: https://doodle.com/poll/kte79im2tz4ape9v On Mon, Jul 6, 2020 at 6:12 AM Julia Kreger wrote: > > Greetings everyone! > > We'll use our meetpad[1]! > > -Julia > > [1]: https://meetpad.opendev.org/ironic > > On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > > > Hi all, > > > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > > > Dmitry > > > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > >> > >> Hi all, > >> > >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! > >> > >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. > >> > >> Dmitry From rafaelweingartner at gmail.com Mon Jul 6 17:11:47 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Mon, 6 Jul 2020 14:11:47 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: It looks like a coding error that we left behind during a major refactoring that we introduced upstream. I created a patch for it. Can you check/review and test it? https://review.opendev.org/739555 On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: > Hi Rafael, > > I have an error and I cannot resolve it myself. > > https://paste.xinu.at/LEfdXD/ > > Do you happen to know what's wrong? > > endpoint list https://paste.xinu.at/v3j1jl/ > octavia.yaml https://paste.xinu.at/TIxfOz/ > polling.yaml https://paste.xinu.at/oBEFj/ > pipeline.yaml https://paste.xinu.at/qvEdTX/ > > > On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> Good catch. I fixed the docs. >> https://review.opendev.org/#/c/739288/ >> >> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >> >>> Hi, >>> >>> I just noticed that the example dynamic.network.services.vpn.connection >>> from >>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>> the wrong indentation. >>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>> >>> Now I have to see why is not polling from it >>> >>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>> >>>> Hi Rafael, >>>> >>>> I think I applied all the reviews successfully but I tried to do an >>>> octavia dynamic poller but I have couples of errors. >>>> >>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>> >>>> if i remove the - in front of name like this: >>>> https://paste.xinu.at/K7s5I8/ >>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>> >>>> Is there something I missed or is something wrong in yaml? >>>> >>>> >>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> >>>>> Since the merging window for ussuri was long passed for those commits, >>>>>> is it safe to assume that it will not land in stable/ussuri at all and >>>>>> those will be available for victoria? >>>>>> >>>>> >>>>> I would say so. We are lacking people to review and then merge it. >>>>> >>>>> How safe is to cherry pick those commits and use them in production? >>>>>> >>>>> As long as the person executing the cherry-picks, and maintaining the >>>>> code knows what she/he is doing, you should be safe. The guys that are >>>>> using this implementation (and others that I and my colleagues proposed), >>>>> have a few openstack components that are customized with the >>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>> not using the community version, but something in-between (the community >>>>> releases + the patches we did). Of course, it is only possible, because we >>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>> quality for production. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>> >>>>>> Hello Rafael, >>>>>> >>>>>> Since the merging window for ussuri was long passed for those >>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>> and those will be available for victoria? >>>>>> >>>>>> How safe is to cherry pick those commits and use them in production? >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> The dynamic pollster in Ceilometer will be first released in Ussuri. >>>>>>> However, there are some important PRs still waiting for a merge, that might >>>>>>> be important for your use case: >>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>> cgoncalves at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I want to meter the loadbalancer into gnocchi for billing purposes >>>>>>>>> in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>> >>>>>>>> >>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>> >>>>>>>> >>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>> >>>>>>>> >>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>> the Ceilometer project. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Ionut, >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello guys, >>>>>>>>>>> I was trying to add in polling.yaml and pipeline from ceilometer >>>>>>>>>>> the following: >>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>> - network.services.lb.member >>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>> >>>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>>> supported in neutron. >>>>>>>>>>> >>>>>>>>>>> I found >>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>> >>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>> octavia. >>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>> deployed and has status active. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can get the provisioning and operating status of Octavia load >>>>>>>>>> balancers via the Octavia API. There is also an API endpoint that returns >>>>>>>>>> the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>> >>>>>>>>>> I hope this helps with your use case. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Carlos >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>> [2] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>> [3] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>> [4] >>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Mon Jul 6 18:37:07 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 14:37:07 -0400 Subject: removing use of pkg_resources to improve command line app performance Message-ID: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps). Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”. A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways. First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name. Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages. Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly. I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. Doug [0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance From smooney at redhat.com Mon Jul 6 18:54:05 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 06 Jul 2020 19:54:05 +0100 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> Message-ID: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command > line apps). > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > produces data in a format that can be cached to make it even faster. I have started adding support for that caching to > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the > same library is available on PyPI as “importlib_metadata”. based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than > stevedore. We have a couple of different use patterns to consider and replace in different ways. > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to > choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for > all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager > directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) > of the available plugins by name. > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s > installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster > because importlib goes directly to the metadata file for the named package instead of looking through all of the > installed packages. > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need > to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. > The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in > stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the > manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely > to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the > work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > Doug > > [0] https://review.opendev.org/#/c/739306/ > [1] https://docs.openstack.org/stevedore/latest/ > [2] https://review.opendev.org/#/c/739379/2 > [3] https://review.opendev.org/#/q/topic:osc-performance > From fungi at yuggoth.org Mon Jul 6 19:02:46 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 6 Jul 2020 19:02:46 +0000 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: <20200706190246.c4u7thhjixgavbjj@yuggoth.org> On 2020-07-06 19:54:05 +0100 (+0100), Sean Mooney wrote: > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: [...] > > Python 3.8 added a new library importlib.metadata, which also > > has an entry points API. It is more efficient, and produces data > > in a format that can be cached to make it even faster. I have > > started adding support for that caching to stevedore [0], which > > is the Oslo library for managing application plugins. For > > version of python earlier than 3.8, the same library is > > available on PyPI as “importlib_metadata”. > > based on > https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst > we still need to support 3.6 for victoria. is there a backport lib > like mock for this on older python releases? [...] According to https://pypi.org/project/importlib-metadata/ the current version (1.7.0) supports Python 3.5 and later. Won't that work? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From doug at doughellmann.com Mon Jul 6 19:03:00 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:03:00 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: > On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: >> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the >> startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of >> importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a >> command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command >> line apps). >> >> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and >> produces data in a format that can be cached to make it even faster. I have started adding support for that caching to >> stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the >> same library is available on PyPI as “importlib_metadata”. > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden. >> >> A big part of the implementation work will actually be removing the use of pkg_resources in places other than >> stevedore. We have a couple of different use patterns to consider and replace in different ways. >> >> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to >> choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for >> all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager >> directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) >> of the available plugins by name. >> >> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s >> installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster >> because importlib goes directly to the metadata file for the named package instead of looking through all of the >> installed packages. >> >> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need >> to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. >> The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in >> stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the >> manager abstractions in stevedore instead of manipulating EntryPoint instances directly. >> >> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely >> to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the >> work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. >> >> Doug >> >> [0] https://review.opendev.org/#/c/739306/ >> [1] https://docs.openstack.org/stevedore/latest/ >> [2] https://review.opendev.org/#/c/739379/2 >> [3] https://review.opendev.org/#/q/topic:osc-performance From radoslaw.piliszek at gmail.com Mon Jul 6 19:06:05 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 6 Jul 2020 21:06:05 +0200 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: On Mon, Jul 6, 2020 at 9:00 PM Sean Mooney wrote: > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a > > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command > > line apps). > > > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > > produces data in a format that can be cached to make it even faster. I have started adding support for that caching to > > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the > > same library is available on PyPI as “importlib_metadata”. > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? Is [1] that Doug mentioned not what you mean? It seems to support 3.5+ As a general remark, I've already seen the WIP. Very excited to see this performance bottleneck eliminated. [1] https://pypi.org/project/importlib-metadata/ -yoctozepto From doug at doughellmann.com Mon Jul 6 19:21:06 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:21:06 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> Message-ID: <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> > On Jul 6, 2020, at 2:37 PM, Doug Hellmann wrote: > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps). > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”. > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways. > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name. > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages. > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > Doug > > [0] https://review.opendev.org/#/c/739306/ > [1] https://docs.openstack.org/stevedore/latest/ > [2] https://review.opendev.org/#/c/739379/2 > [3] https://review.opendev.org/#/q/topic:osc-performance I neglected to mention that there are uses of pkg_resources outside of OpenStack code in libraries used by python-openstackclient. I found a use in dogpile and another in cmd2. I haven’t started working on patches to those, yet. If someone wants to do a more extensive search that would be very helpful. I started an etherpad to keep track of the work that’s in progress: https://etherpad.opendev.org/p/osc-performance From fungi at yuggoth.org Mon Jul 6 19:29:24 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 6 Jul 2020 19:29:24 +0000 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <1C860D98-B45B-4AC7-8BE4-5A1DCFEBD15C@doughellmann.com> Message-ID: <20200706192924.xl76yl5k4ct47gh3@yuggoth.org> On 2020-07-06 15:21:06 -0400 (-0400), Doug Hellmann wrote: [...] > I neglected to mention that there are uses of pkg_resources > outside of OpenStack code in libraries used by > python-openstackclient. I found a use in dogpile and another in > cmd2. I haven’t started working on patches to those, yet. If > someone wants to do a more extensive search that would be very > helpful. I started an etherpad to keep track of the work that’s in > progress: https://etherpad.opendev.org/p/osc-performance Looking at some other uses of pkg_resources, seems like this would be the new way to get the abbreviated Git commit ID stored by PBR: json.loads( importlib.metadata.distribution(packagename).read_text('pbr.json') )['git_version'] -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From smooney at redhat.com Mon Jul 6 19:30:45 2020 From: smooney at redhat.com (Sean Mooney) Date: Mon, 06 Jul 2020 20:30:45 +0100 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote: > > On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: > > > > On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: > > > We have had a long-standing issue with the performance of the openstack command line tool. At least part of the > > > startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of > > > importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by > > > a > > > command line application (long-running services are candidates, too, but the benefit is bigger in short-lived > > > command > > > line apps). > > > > > > Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and > > > produces data in a format that can be cached to make it even faster. I have started adding support for that > > > caching to > > > stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, > > > the > > > same library is available on PyPI as “importlib_metadata”. > > > > based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need > > to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? > > Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and > if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which > version of the library is invoked will be hidden. cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. so it sound like this should just work for os-vif at least. > > > > > > > A big part of the implementation work will actually be removing the use of pkg_resources in places other than > > > stevedore. We have a couple of different use patterns to consider and replace in different ways. > > > > > > First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of > > > them to > > > choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation > > > for > > > all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a > > > stevedore.ExtensionManager > > > directly, but the other managers are meant to implement common access patterns like selecting a subset (or just > > > one) > > > of the available plugins by name. > > > > > > Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s > > > installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* > > > faster > > > because importlib goes directly to the metadata file for the named package instead of looking through all of the > > > installed packages. > > > > > > Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may > > > need > > > to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from > > > pkg_resources. > > > The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that > > > in > > > stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use > > > the > > > manager abstractions in stevedore instead of manipulating EntryPoint instances directly. > > > > > > I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s > > > likely > > > to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the > > > work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. > > > > > > Doug > > > > > > [0] https://review.opendev.org/#/c/739306/ > > > [1] https://docs.openstack.org/stevedore/latest/ > > > [2] https://review.opendev.org/#/c/739379/2 > > > [3] https://review.opendev.org/#/q/topic:osc-performance > > From doug at doughellmann.com Mon Jul 6 19:33:05 2020 From: doug at doughellmann.com (Doug Hellmann) Date: Mon, 6 Jul 2020 15:33:05 -0400 Subject: removing use of pkg_resources to improve command line app performance In-Reply-To: References: <70F8544A-EB43-45E5-AC81-729CFB9CA63C@doughellmann.com> <9992b1938b56f4e7318d30a5f2a0e27dc7ff3a61.camel@redhat.com> Message-ID: <6BBE8880-0CCC-40A1-8BAB-C9A992B310B5@doughellmann.com> > On Jul 6, 2020, at 3:30 PM, Sean Mooney wrote: > > On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote: >>> On Jul 6, 2020, at 2:54 PM, Sean Mooney wrote: >>> >>> On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: >>>> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the >>>> startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of >>>> importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by >>>> a >>>> command line application (long-running services are candidates, too, but the benefit is bigger in short-lived >>>> command >>>> line apps). >>>> >>>> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and >>>> produces data in a format that can be cached to make it even faster. I have started adding support for that >>>> caching to >>>> stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, >>>> the >>>> same library is available on PyPI as “importlib_metadata”. >>> >>> based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need >>> to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? >> >> Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and >> if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which >> version of the library is invoked will be hidden. > cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager > https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 > maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. > so it sound like this should just work for os-vif at least. That’s definitely the goal of putting the cache behind the stevedore API. >> >>>> >>>> A big part of the implementation work will actually be removing the use of pkg_resources in places other than >>>> stevedore. We have a couple of different use patterns to consider and replace in different ways. >>>> >>>> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of >>>> them to >>>> choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation >>>> for >>>> all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a >>>> stevedore.ExtensionManager >>>> directly, but the other managers are meant to implement common access patterns like selecting a subset (or just >>>> one) >>>> of the available plugins by name. >>>> >>>> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s >>>> installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* >>>> faster >>>> because importlib goes directly to the metadata file for the named package instead of looking through all of the >>>> installed packages. >>>> >>>> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may >>>> need >>>> to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from >>>> pkg_resources. >>>> The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that >>>> in >>>> stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use >>>> the >>>> manager abstractions in stevedore instead of manipulating EntryPoint instances directly. >>>> >>>> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s >>>> likely >>>> to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the >>>> work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. >>>> >>>> Doug >>>> >>>> [0] https://review.opendev.org/#/c/739306/ >>>> [1] https://docs.openstack.org/stevedore/latest/ >>>> [2] https://review.opendev.org/#/c/739379/2 >>>> [3] https://review.opendev.org/#/q/topic:osc-performance From lyarwood at redhat.com Mon Jul 6 21:50:04 2020 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 6 Jul 2020 22:50:04 +0100 Subject: [nova][stable] The openstack/nova stable/pike branch is currently unmaintained In-Reply-To: <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> References: <20200706105721.a7ciwltuskjxxksu@lyarwood.usersys.redhat.com> <57f7b5e7-3838-0ce5-4601-80eb7585e41b@est.tech> Message-ID: <20200706215004.e74qvc45ypa3umd3@lyarwood.usersys.redhat.com> On 06-07-20 18:02:17, Előd Illés wrote: > Just a heads-up that a devstack patch [1] addresses the issues in Pike. As > soon as that is merging, stable/pike hopefully will be ready to accept > fixes. I'll try to keep Pike working, but of course, anyone who is > interested to help are welcome. :) > > [1] https://review.opendev.org/#/c/735616/ Excellent thanks Előd! Assuming that change lands in devstack and we start landing changes in openstack/nova again then the branch will return to the Extended Maintenance phase. > On 2020. 07. 06. 12:57, Lee Yarwood wrote: > > Hello all, > > > > Following on from my recent mail about the stable/ocata branch of the > > openstack/nova project now being unmaintained [1] I'd also like to move > > the stable/pike [2] branch formally into this phase of maintenance [3]. > > > > Volunteers are welcome to step forward and attempt to move the branch > > back to the ``Extended Maintenance`` phase by proposing changes and > > fixing CI in the next 3 months, otherwise the branch will be marked as > > ``EOL`` [4]. > > > > Again hopefully this isn't taking anyone by surprise but please let me > > know if this is going to be an issue! > > > > Regards, > > > > [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > > [2] https://review.opendev.org/#/q/project:openstack/nova+branch:stable/pike > > [3] https://docs.openstack.org/project-team-guide/stable-branches.html#unmaintained > > [4] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From zhengyupann at 163.com Tue Jul 7 07:39:03 2020 From: zhengyupann at 163.com (Zhengyu Pan) Date: Tue, 7 Jul 2020 15:39:03 +0800 (CST) Subject: [neutron][lbaas][octavia] How to implement health check using 100.64.0.0/10 network segments in loadbalancer? Message-ID: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> There are some private cloud or public cloud introduction: They use 100.64.0.0/14 network segments to check vm's health status in load balancer. In Region supporting VPC, load balancing private network IP and health check IP will be switched to 100 network segment. I can't understand how to implement it. How to do it? -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionut at fleio.com Tue Jul 7 07:52:46 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 7 Jul 2020 10:52:46 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Seems to work fine now. Thanks. On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > It looks like a coding error that we left behind during a major > refactoring that we introduced upstream. > I created a patch for it. Can you check/review and test it? > https://review.opendev.org/739555 > > On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: > >> Hi Rafael, >> >> I have an error and I cannot resolve it myself. >> >> https://paste.xinu.at/LEfdXD/ >> >> Do you happen to know what's wrong? >> >> endpoint list https://paste.xinu.at/v3j1jl/ >> octavia.yaml https://paste.xinu.at/TIxfOz/ >> polling.yaml https://paste.xinu.at/oBEFj/ >> pipeline.yaml https://paste.xinu.at/qvEdTX/ >> >> >> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> Good catch. I fixed the docs. >>> https://review.opendev.org/#/c/739288/ >>> >>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>> >>>> Hi, >>>> >>>> I just noticed that the example dynamic.network.services.vpn.connection >>>> from >>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>> the wrong indentation. >>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>> >>>> Now I have to see why is not polling from it >>>> >>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>> >>>>> Hi Rafael, >>>>> >>>>> I think I applied all the reviews successfully but I tried to do an >>>>> octavia dynamic poller but I have couples of errors. >>>>> >>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>> >>>>> if i remove the - in front of name like this: >>>>> https://paste.xinu.at/K7s5I8/ >>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>> >>>>> Is there something I missed or is something wrong in yaml? >>>>> >>>>> >>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> >>>>>> Since the merging window for ussuri was long passed for those >>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>> and those will be available for victoria? >>>>>>> >>>>>> >>>>>> I would say so. We are lacking people to review and then merge it. >>>>>> >>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>> >>>>>> As long as the person executing the cherry-picks, and maintaining the >>>>>> code knows what she/he is doing, you should be safe. The guys that are >>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>> have a few openstack components that are customized with the >>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>> not using the community version, but something in-between (the community >>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>> quality for production. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>> >>>>>>> Hello Rafael, >>>>>>> >>>>>>> Since the merging window for ussuri was long passed for those >>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>> and those will be available for victoria? >>>>>>> >>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>> that might be important for your use case: >>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>> >>>>>>>>> >>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>>> the Ceilometer project. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ionut, >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello guys, >>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>> >>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that were >>>>>>>>>>>> supported in neutron. >>>>>>>>>>>> >>>>>>>>>>>> I found >>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>> >>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>> octavia. >>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>> >>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Carlos >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>> [2] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>> [3] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>> [4] >>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Tue Jul 7 07:54:13 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Tue, 7 Jul 2020 00:54:13 -0700 Subject: OpenStack cluster event notification Message-ID: Hi All, So far, based on my understanding of OpenStack Python SDK, I am able to read the Hypervisor, Servers instances, however, I do not see an API to receive and handle the change notification/events for the operations that happens on the cluster e.g. A new VM is added, an existing VM is deleted etc. I see a documentation, which talks about emitting notifications over a message bus that indicate different events that occur within the service. Notifications in OpenStack https://docs.openstack.org/ironic/latest/admin/notifications.html 1. Does Openstack Python SDK support notification APIs? 2. How do I receive/monitor notifications for VM related changes? 3. How do I receive/monitor notifications for compute/hypervisor related changes? 4. How do I receive/monitor notifications for Virtual Switch related changes? Thanks in advance for any help in this regard. /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.schaefer at cloudandheat.com Tue Jul 7 08:43:47 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Tue, 07 Jul 2020 10:43:47 +0200 Subject: Neutron bandwidth metering based on remote address Message-ID: <2890841.xduM2AgYMW@antares> Dear list, We are trying to implement tenant bandwidth metering at the neutron router level. Since some of the network spaces connected to the external interface of the neutron router are supposed to be unmetered, we need to match on the remote address. Conveniently, there exists a --remote-ip-prefix option on meter label create; however, since [1], its meaning was changed to the exact opposite: Instead of matching on the *remote* prefix (towards the external interface), it matches on the *local* prefix (towards the OS tenant network). In an ideal world, we would want to revert that change and instead introduce a --local-ip-prefix option which covers that use-case. I suppose this is not a thing we *should* do though, given that this change made it into a few releases already. Instead, we’ll have to create a new option (which whatever name) + associated database schema + iptables rule patterns to implement the feature. The questions associated with this are now: - Does this make absolutely no sense to anyone? - What is the process for this? I suppose since this change was made intentionally and passed review, our desired change needs to go through a feature request process (blueprints maybe?). kind regards, Jonas Schäfer [1]: https://opendev.org/openstack/neutron/commit/ 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From ionut at fleio.com Tue Jul 7 10:49:05 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 7 Jul 2020 13:49:05 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hello again, What's the proper way to handle dynamic pollsters in gnocchi ? Right now ceilometer returns: WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia is not handled by Gnocchi I found https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html but I'm not sure if is the right direction. On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: > Seems to work fine now. Thanks. > > On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> It looks like a coding error that we left behind during a major >> refactoring that we introduced upstream. >> I created a patch for it. Can you check/review and test it? >> https://review.opendev.org/739555 >> >> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >> >>> Hi Rafael, >>> >>> I have an error and I cannot resolve it myself. >>> >>> https://paste.xinu.at/LEfdXD/ >>> >>> Do you happen to know what's wrong? >>> >>> endpoint list https://paste.xinu.at/v3j1jl/ >>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>> polling.yaml https://paste.xinu.at/oBEFj/ >>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>> >>> >>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> Good catch. I fixed the docs. >>>> https://review.opendev.org/#/c/739288/ >>>> >>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>> >>>>> Hi, >>>>> >>>>> I just noticed that the example >>>>> dynamic.network.services.vpn.connection from >>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>> the wrong indentation. >>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>> >>>>> Now I have to see why is not polling from it >>>>> >>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>> >>>>>> Hi Rafael, >>>>>> >>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>> octavia dynamic poller but I have couples of errors. >>>>>> >>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>>> >>>>>> if i remove the - in front of name like this: >>>>>> https://paste.xinu.at/K7s5I8/ >>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>> >>>>>> Is there something I missed or is something wrong in yaml? >>>>>> >>>>>> >>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>> and those will be available for victoria? >>>>>>>> >>>>>>> >>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>> >>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>> >>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>> have a few openstack components that are customized with the >>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>> not using the community version, but something in-between (the community >>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>> quality for production. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>> >>>>>>>> Hello Rafael, >>>>>>>> >>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>> and those will be available for victoria? >>>>>>>> >>>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>> >>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>> that might be important for your use case: >>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if you >>>>>>>>>> wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer to >>>>>>>>>> the Ceilometer project. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>> >>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>> >>>>>>>>>>>>> I found >>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>> >>>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>>> octavia. >>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>>> >>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Carlos >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>> [2] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>> [3] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>> [4] >>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Rafael Weingärtner >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Jul 7 11:43:06 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 7 Jul 2020 08:43:06 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: That is the right direction. I don't know why people hard-coded the initial pollsters' configs and did not document the relation between Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a single system, but interdependent systems to implement a monitoring solution. Ceilometer is the component that gathers data/information, processes, and then persists it somewhere. Gnocchi is one of the options that Ceilometer can use to persist data. By default, Ceilometer creates some basic configurations in Gnocchi to store data, such as some default resource-types with default attributes. However, we do not need (should not) rely on this default config. You can create and use custom resources to fit the stack to your needs. This can be achieved via `gnocchi resource-type create -a :: ` and `gnocchi resource-type create -u :: `. Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), you can customize the mapping of metrics to resource-types in Gnocchi. On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: > Hello again, > > What's the proper way to handle dynamic pollsters in gnocchi ? > Right now ceilometer returns: > > WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia is > not handled by Gnocchi > > I found > https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html > but I'm not sure if is the right direction. > > On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: > >> Seems to work fine now. Thanks. >> >> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> It looks like a coding error that we left behind during a major >>> refactoring that we introduced upstream. >>> I created a patch for it. Can you check/review and test it? >>> https://review.opendev.org/739555 >>> >>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>> >>>> Hi Rafael, >>>> >>>> I have an error and I cannot resolve it myself. >>>> >>>> https://paste.xinu.at/LEfdXD/ >>>> >>>> Do you happen to know what's wrong? >>>> >>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>> >>>> >>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> Good catch. I fixed the docs. >>>>> https://review.opendev.org/#/c/739288/ >>>>> >>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I just noticed that the example >>>>>> dynamic.network.services.vpn.connection from >>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>> the wrong indentation. >>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>> >>>>>> Now I have to see why is not polling from it >>>>>> >>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>> >>>>>>> Hi Rafael, >>>>>>> >>>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>>> octavia dynamic poller but I have couples of errors. >>>>>>> >>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>> Error is about syntax error near name: https://paste.xinu.at/MHgDBY/ >>>>>>> >>>>>>> if i remove the - in front of name like this: >>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>> >>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>> and those will be available for victoria? >>>>>>>>> >>>>>>>> >>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>> >>>>>>>> How safe is to cherry pick those commits and use them in production? >>>>>>>>> >>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>> have a few openstack components that are customized with the >>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>> not using the community version, but something in-between (the community >>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>> quality for production. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hello Rafael, >>>>>>>>> >>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>> and those will be available for victoria? >>>>>>>>> >>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>> production? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>> that might be important for your use case: >>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>> >>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found >>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was wondering if there is a way to meter loadbalancers from >>>>>>>>>>>>>> octavia. >>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>> returns the full load balancer status tree [1]. Additionally, Octavia >>>>>>>>>>>>> has three API endpoints for statistics [2][3][4]. >>>>>>>>>>>>> >>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Carlos >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>> [2] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>> [3] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>> [4] >>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Rafael Weingärtner >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Tue Jul 7 12:09:29 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 7 Jul 2020 09:09:29 -0300 Subject: Neutron bandwidth metering based on remote address In-Reply-To: <2890841.xduM2AgYMW@antares> References: <2890841.xduM2AgYMW@antares> Message-ID: Hallo Jonas, I have worked to address this specific use case. First, the part of the solution that is already implemented. If you only need to gather metrics in a tenant fashion, you can take a look into this PR: https://review.opendev.org/#/c/735605/. That pull request enables operators to configure shared traffic labels, and then, these traffic labels will be exposed/published with different granularities. The different granularities are router, tenant, label, router-label, and tenant-label. The complete explanation can be found in the "RST" document that the PR also introduces, where we wrote a complete description of neutron metering, its configs, and usage. You are welcome to review and help us get this PR merged :) So far, if all you need is to measure the whole traffic, but in different granularities, that PR will probably be enough. On the other hand, if you need to create more complex rules to filter by source/destination IPs, then we need something else. Interestingly enough, we are working towards that. We will extend neutron API, and neutron metering to allow operators to use "remote-ip" and "source-ip" to create metering labels rules. We also saw the PR that changed the behavior of the "remote-ip" property, and the whole confusion it caused (at least for us). However, instead of proposing to revert it, we are working towards enabling the API to handle "remote-ip" and "source-ip", which will cover the use case of the person that introduced that commit, and many others such as ours and yours (probably). On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < jonas.schaefer at cloudandheat.com> wrote: > Dear list, > > We are trying to implement tenant bandwidth metering at the neutron router > level. Since some of the network spaces connected to the external > interface of > the neutron router are supposed to be unmetered, we need to match on the > remote address. > > Conveniently, there exists a --remote-ip-prefix option on meter label > create; > however, since [1], its meaning was changed to the exact opposite: Instead > of > matching on the *remote* prefix (towards the external interface), it > matches > on the *local* prefix (towards the OS tenant network). > > In an ideal world, we would want to revert that change and instead > introduce a > --local-ip-prefix option which covers that use-case. I suppose this is not > a > thing we *should* do though, given that this change made it into a few > releases already. > > Instead, we’ll have to create a new option (which whatever name) + > associated > database schema + iptables rule patterns to implement the feature. > > The questions associated with this are now: > > - Does this make absolutely no sense to anyone? > - What is the process for this? I suppose since this change was made > intentionally and passed review, our desired change needs to go through a > feature request process (blueprints maybe?). > > kind regards, > Jonas Schäfer > > [1]: https://opendev.org/openstack/neutron/commit/ > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.hentsch at secustack.com Tue Jul 7 12:49:11 2020 From: markus.hentsch at secustack.com (Markus Hentsch) Date: Tue, 7 Jul 2020 14:49:11 +0200 Subject: [glance] Global Request ID issues in Glance In-Reply-To: References: <03b6180a-a287-818c-695e-42c006ce1347@secustack.com> Message-ID: Hi Abhishek, thanks for having a look! I've filed corresponding bug reports: Glance client: https://bugs.launchpad.net/python-glanceclient/+bug/1886650 Glance API: https://bugs.launchpad.net/glance/+bug/1886657 Best regards, Markus Abhishek Kekane wrote: > Hi Markus, > > Thank you for detailed analysis. > Both cases you pointed out are valid bugs. Could you please report > this to launchpad? > > Thanks & Best Regards, > > Abhishek Kekane > > > On Fri, Jun 26, 2020 at 6:33 PM Markus Hentsch > > > wrote: > > Hello everyone, > > while I was experimenting with the Global Request ID functionality of > OpenStack [1], I identified two issues in Glance related to this > topic. > I have written my findings below and would appreciate it if you could > take a look and confirm whether those are intended behaviors or indeed > issues with the implementation. > > In case of the latter please advice me which bug tracker to report > them > to. > > > 1. The Glance client does not correctly forward the global ID > > When the SessionClient class is used, the global_request_id is removed > from kwargs in the constructor using pop() [2]. Directly after this, > the parent constructor is called using super(), which in this case is > Adapter from the keystoneauth1 library. Therein the global_request_id > is set again [3] but since it has been removed from the kwargs, it > defaults to None as specified in the Adapter's __init__() header. > Thus, > the global_request_id passed to the SessionClient constructor never > actually makes it to the Glance API. This is in contrast to the > HTTPClient class, where get() is used instead of pop() [4]. > > This can be reproduced simply by creating a server in Nova from an > image in Glance, which will attempt to create the Glance client > instance using the global_request_id [5]. Passing the > "X-Openstack-Request-Id" header during the initial API call for the > server creation, makes it visible in Nova (using a suitable > "logging_context_format_string" setting) but it's not visible in > Glance. Using a Python debugger shows Glance generating a new local ID > instead. > > > 2. Glance interprets global ID as local one for Oslo Context objects > > While observing the Glance log file, I observed Glance always logging > the global_request_id instead of a local one if it is available. > > Using "%(global_request_id)s" within > "logging_context_format_string"[6] > in the glance-api.conf will always print "None" in the logs whereas > "%(request_id)s" will either be an ID generated by Glance if no global > ID is available or the received global ID. > > Culprit seems to be the context middleware of Glance where the global > ID in form of the "X-Openstack-Request-Id" header is parsed from the > request and passed as "request_id" instead of "global_request_id" to > the "glance.context.RequestContext.from_environ()" call [7]. > > This is in contrast to other services such as Nova or Neutron where > the two variables actually print the values according to their name > (request_id always being the local one, whereas global_request_id is > the global one or None). > > > [1] > https://specs.openstack.org/openstack/oslo-specs/specs/pike/global-req-id.html > [2] > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L355 > [3] > https://github.com/openstack/keystoneauth/blob/dab8e1057ae8bb9a0e778fb8d3141ad4fb36a339/keystoneauth1/adapter.py#L166 > [4] > https://github.com/openstack/python-glanceclient/blob/de178ac4382716cc93022be06b93697936e816fc/glanceclient/common/http.py#L162 > [5] > https://github.com/openstack/nova/blob/1cae0cd7229207478b70275509aecd778ca69225/nova/image/glance.py#L78 > [6] > https://docs.openstack.org/oslo.context/2.17.0/user/usage.html#context-variables > [7] > https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/middleware/context.py#L201 > > > Thanks! > > Markus > > -- > Markus Hentsch > Team Leader > > secustack GmbH - Digital Sovereignty in the Cloud > https://www.secustack.com > Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden > District Court Dresden, Register Number: HRB 38890 > > -- Markus Hentsch Team Leader secustack GmbH - Digital Sovereignty in the Cloud https://www.secustack.com Königsbrücker Straße 96 (Gebäude 30) | 01099 Dresden District Court Dresden, Register Number: HRB 38890 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Tue Jul 7 14:09:09 2020 From: stephenfin at redhat.com (Stephen Finucane) Date: Tue, 07 Jul 2020 15:09:09 +0100 Subject: [nova] Changes for out-of-tree drivers Message-ID: <4ceab688ecde83dc4da8dda567a355e499cd8c6f.camel@redhat.com> I have a change proposed [1] as part of the work to add vTPM support to nova that will modify the arguments for the 'unrescue' function. As noted in the commit message, this is expected to gain a 'context' argument and lose the currently unused 'network_info' argument. If you maintain an out-of-tree driver, you will need to account for this change. Cheers, Stephen [1] https://review.opendev.org/#/c/730382/ From gagehugo at gmail.com Tue Jul 7 19:48:09 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Tue, 7 Jul 2020 14:48:09 -0500 Subject: [openstack-helm] Proposing Andrii Ostapenko for core of OpenStack-Helm Message-ID: Hello everyone, Andrii Ostapenko (andrii_ostapenko) has been very active lately in the openstack-helm community, notably his efforts in driving loci forward as well as him maintaining a lot of our images and providing great in-depth reviews. Due to these reasons, I am proposing Andrii as a core reviewer for OpenStack-Helm. If anyone has any feedback, please feel free to reply here by the end of the week! -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Jul 7 21:13:28 2020 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 7 Jul 2020 14:13:28 -0700 Subject: [all][TC] New Office Hours Times In-Reply-To: References: Message-ID: Hello! I wanted to push this to the top of people's inboxes again. It looks like we are still missing several TC member's responses, and I would love some more community response as well since the office hours are FOR you! Please take a few min to fill out the survey for new office hours times[1]. -Kendall (diablo_rojo) [1] https://doodle.com/poll/q27t8pucq7b8xbme On Thu, Jul 2, 2020 at 2:52 PM Kendall Nelson wrote: > Hello! > > It's been a while since the office hours had been refreshed and we have a > lot of new people on the TC that were not around when the times were set. > > In an effort to stir things up a bit, and get more community engagement, > we are picking new times! > > I want to invite everyone in the community interested in interacting more > with the TC to respond to the poll so we have your input as the office > hours are really for your benefit anyway. (Nevermind the name of the poll > :) Too much work to remake the whole thing just to rename it..) > > That said, we do need responses from ALL TC members so that we can also > document who will (typically) be present for each office hour as well. > > (Also, thanks Mohammed for putting the poll together! It's no joke. ) > > -Kendall (diablo_rojo) > > [1] https://doodle.com/poll/q27t8pucq7b8xbme > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Tue Jul 7 21:53:35 2020 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 7 Jul 2020 16:53:35 -0500 Subject: [keystone][zun] Choice between 'ca_file' and 'cafile' In-Reply-To: References: Message-ID: <1635499f-ade9-07b4-191f-36ee431923dd@nemebean.com> On 7/2/20 2:23 AM, Radosław Piliszek wrote: > On Wed, Jul 1, 2020 at 10:31 PM Sean McGinnis wrote: >> >> On 7/1/20 2:24 PM, Hongbin Lu wrote: >>> Hi all, >>> >>> A short question. I saw a few projects are using the name 'ca_file' >>> [1] as config option, while others are using 'cafile' [2]. I wonder >>> what is the flavorite name convention? >>> >>> I asked this question because Kolla developer suggested Zun to rename >>> from 'ca_file' to 'cafile' to avoid the confusion [3]. I want to >>> confirm if this is a good idea from Keystone's perspective. Thanks. >>> >>> Best regards, >>> Hongbin >>> >>> [1] >>> http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27ca_file%27&i=nope&files=&repos= >>> [2] >>> http://codesearch.openstack.org/?q=cfg.StrOpt%5C(%27cafile%27&i=nope&files=&repos= >>> [3] https://review.opendev.org/#/c/738329/ >> >> Cinder and Glance both use ca_file (and ssl_ca_file and vmware_ca_file, >> and registry_client_ca_file). >> From keystone_auth, we do also have cafile. >> >> Personally, I find the separation of ca_file to be much easier to read. >> >> Sean >> >> > > Yeah, it was me to suggest the aliasing. We found that the 'cafile' > seems more prevalent. We missed that underscore for Zun and scratched > our heads "what are we doing wrong there?". Sounds like a job for https://docs.openstack.org/oslo.config/latest/cli/validator.html ;-) I don't have a strong opinion on which we should choose, but I will note that whichever it is, we can leave deprecated names for the other so nobody gets broken by the change. Probably incomplete lists of references to both names: http://codesearch.openstack.org/?q=StrOpt%5C(%27ca_file%27&i=nope&files=&repos= http://codesearch.openstack.org/?q=StrOpt%5C(%27cafile%27&i=nope&files=&repos= Unfortunately keystone and oslo.service differ, so no matter which we choose a lot of projects are going to inherit a deprecated opt. From juliaashleykreger at gmail.com Tue Jul 7 23:18:33 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 7 Jul 2020 16:18:33 -0700 Subject: [ironic] 2nd Victoria meetup In-Reply-To: References: Message-ID: Greetings fellow humans, Following up, the consensus seems to have arrived at 2:30 PM UTC tomorrow (Wednesday). This is 7:30 US Pacific. We will use meetpad[1]. Thanks everyone! -Julia [1]: https://meetpad.opendev.org/ironic On Mon, Jul 6, 2020 at 9:15 AM Julia Kreger wrote: > > Greetings fellow humans! > > We had a great two hour session but we ran out of time to get back to > the discussion of a capability/driver support matrix. > > We agreed we should have a call later in the week to dive back into > the topic. I've created a doodle[1] for us to identify the best time > for a hopefully quick 30 minute call to try and reach consensus. > > Thanks everyone! > > -Julia > > [1]: https://doodle.com/poll/kte79im2tz4ape9v > > On Mon, Jul 6, 2020 at 6:12 AM Julia Kreger wrote: > > > > Greetings everyone! > > > > We'll use our meetpad[1]! > > > > -Julia > > > > [1]: https://meetpad.opendev.org/ironic > > > > On Mon, Jul 6, 2020 at 12:48 AM Dmitry Tantsur wrote: > > > > > > Hi all, > > > > > > Sorry for the late notice, the meetup will be *today*, July 6th from 2pm to 4pm UTC. We will likely use meetpad (I need to sync with Julia on it), please stop by IRC before the call for the exact link. Because of the time conflict, it will replace our weekly meeting. > > > > > > Dmitry > > > > > > On Tue, Jun 30, 2020 at 1:50 PM Dmitry Tantsur wrote: > > >> > > >> Hi all, > > >> > > >> Since we're switching to 6 releases per year cadence, I think it makes sense to have short virtual meetups after every release. The goal will be to sync on priorities, exchange ideas and define plans for the upcoming 2 months of development. Fooling around is also welcome! > > >> > > >> Please vote for the best 2 hours slot next week: https://doodle.com/poll/3r9tbhmniattkty8. I tried to include more potential time zones, so apologies for so many options. Please cast your vote until Friday, 12pm UTC, so that I can announce the final time slot this week. > > >> > > >> Dmitry From juliaashleykreger at gmail.com Tue Jul 7 23:32:51 2020 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 7 Jul 2020 16:32:51 -0700 Subject: OpenStack cluster event notification In-Reply-To: References: Message-ID: Greetings! Unfortunately, I don't know the eventing context outside of ironic's ability to emit them, but I'll try my best to answer questions with my context. On Tue, Jul 7, 2020 at 1:02 AM Anil Jangam wrote: > > Hi All, > > So far, based on my understanding of OpenStack Python SDK, I am able to read the Hypervisor, Servers instances, however, I do not see an API to receive and handle the change notification/events for the operations that happens on the cluster e.g. A new VM is added, an existing VM is deleted etc. > > I see a documentation, which talks about emitting notifications over a message bus that indicate different events that occur within the service. > > Notifications in OpenStack > > https://docs.openstack.org/ironic/latest/admin/notifications.html I suspect you may also find https://docs.openstack.org/nova/latest/reference/notifications.html useful. > > Does Openstack Python SDK support notification APIs? I'm going to guess the answer is no to this. As you noted earlier, the notifications are emitted to the message bus. These notifications can be read by a subscriber to the message bus itself, but this also means that the bus is directly connected to by some sort of messaging client. The Python SDK is intended for developers to use to leverage the REST APIs offered by services and components, not the message bus. > How do I receive/monitor notifications for VM related changes? > How do I receive/monitor notifications for compute/hypervisor related changes? > How do I receive/monitor notifications for Virtual Switch related changes? I think what you are looking for is ceilometer. https://docs.openstack.org/ceilometer/latest/admin/telemetry-data-collection.html#notifications Although that being said, I don't think much would really prevent you from consuming the notifications directly from the message bus, if you so desire. Maybe someone already has some code for this on hand. > > Thanks in advance for any help in this regard. > Hope this helped. > /anil. > -Julia From gouthampravi at gmail.com Wed Jul 8 00:18:48 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Tue, 7 Jul 2020 17:18:48 -0700 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <20200623102448.eocahkszcd354b5d@skaplons-mac> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: On Tue, Jun 23, 2020 at 3:32 AM Slawek Kaplonski wrote: > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend > in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, > neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by > the > original authors of OVS. It is an attempt to re-do the ML2/OVS control > plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different > architecture, > moving us away from Python agents communicating with the Neutron API > service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the > main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are > built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those > gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack > configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and > it > works fine. Recently (See [7]) we also proposed to add an OVN based job to > the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein > cycle > and there were no any significant issues related strictly to this change. > It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, > like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the > names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make > such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this > thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron > repo to > Devstack repo - we don’t want to force everyone else to add > “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron > with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main > OpenStack > projects to check if it will not break anything in the gate of those > projects. > +1 This plan looks great. We test Neutron integration quite a bit in OpenStack Manila devstack jobs and in third party CI associated with the project. We've tested OVN in the past and noticed it made share server provisioning faster and more reliable. So I don't think we would be affected negatively should you change the default mechanism and driver. However, please keep us in mind, and perhaps alert me when you post patches so we can test everything is okay. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Jul 8 08:51:01 2020 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 8 Jul 2020 10:51:01 +0200 Subject: [largescale-sig] Next meeting: July 8, 8utc In-Reply-To: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> References: <41af7bd5-5aaa-566d-a99c-dc19873b2422@openstack.org> Message-ID: Meeting logs at: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-07-08-08.00.html TODOs: - ttx to identify from the chat interested candidates from Opendev event and invite them to next meeting - amorin to add some meat to the wiki page before we push the Nova doc patch further - all to describe briefly how you solved metrics/billing in your deployment in https://etherpad.openstack.org/p/large-scale-sig-documentation - amorin to start a thread on osarchiver proposing to land it somewhere in openstack - amorin to start a [largescale-sig] thread about his middleware ping approach, SIG members can comment if that makes sense for them Next meeting: Jul 22, 8:00UTC on #openstack-meeting-3 -- Thierry Carrez (ttx) From reza.b2008 at gmail.com Wed Jul 8 11:39:48 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Wed, 8 Jul 2020 16:09:48 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails Message-ID: Hi, I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} Any suggestion would be grateful. Regards, Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at est.tech Wed Jul 8 12:05:29 2020 From: balazs.gibizer at est.tech (=?iso-8859-1?q?Bal=E1zs?= Gibizer) Date: Wed, 08 Jul 2020 14:05:29 +0200 Subject: OpenStack cluster event notification In-Reply-To: References: Message-ID: <59G5DQ.5J8F1FJXF7IT3@est.tech> On Tue, Jul 7, 2020 at 16:32, Julia Kreger wrote: [snip] > > Although that being said, I don't think much would really prevent you > from consuming the notifications directly from the message bus, if you > so desire. Maybe someone already has some code for this on hand. Here is some example code that forwards the nova versioned notifications from the message bus out to a client via websocket [1]. I used this sample code in my demo [2] during a summit presentation. Cheers, gibi [1] https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py [2] https://www.youtube.com/watch?v=WFq5JWXa9AM From laszlo.budai at gmail.com Wed Jul 8 14:21:53 2020 From: laszlo.budai at gmail.com (Budai Laszlo) Date: Wed, 8 Jul 2020 17:21:53 +0300 Subject: [Neutron] GRE network MTU Message-ID: <7b9a4951-8451-e495-d582-ef6eec15182c@gmail.com> Dear all, what is the maximum MTU value for a GRE network? How is that related to the physical interfaces' MTU? Thank you, Laszlo From marek.lycka at ultimum.io Wed Jul 8 14:45:26 2020 From: marek.lycka at ultimum.io (=?UTF-8?B?TWFyZWsgTHnEjWth?=) Date: Wed, 8 Jul 2020 16:45:26 +0200 Subject: [Cinder] Message-ID: Hi all, I'm currently looking into extending Nova API to allow on-demand VM quiescing with the ultimate goal being improved Cinder snapshot creation. The spec is undergoing review at the moment and I was wondering if someone from Cinder would be kind enough to look it over and give their thoughts on it: https://review.opendev.org/#/c/702810/ Thanks in advance. -- Marek Lyčka Linux Developer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic marek.lycka at ultimum.io *https://ultimum.io * -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Wed Jul 8 15:13:42 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 8 Jul 2020 09:13:42 -0600 Subject: [tripleo] updating the language we use in our code base Message-ID: Greetings, At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from our code base [1]. We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives. Chris Wright wrote a nice blog post on the subject [2], please take a second to review Bogdan's spec and Chris's blog post. Also a thank you to Emilien, Alex and Bogdan for already getting started. In other news Arx Cruz will be starting a similar thread for the tempest project. Thanks Arx! [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst [2] https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language Patches to be aware of: https://review.opendev.org/#/c/738858/ https://review.opendev.org/#/c/738894/ https://review.opendev.org/#/c/740013 Thanks for your time! -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Wed Jul 8 18:20:46 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 8 Jul 2020 11:20:46 -0700 Subject: [neutron][lbaas][octavia] How to implement health check using 100.64.0.0/10 network segments in loadbalancer? In-Reply-To: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> References: <45299896.5594.1732836be3d.Coremail.zhengyupann@163.com> Message-ID: Hi Zhengyu, I'm not sure I understand your question, so I'm going to take a guess. Correct me if I am answering the wrong question. First question I have is are you using the EOL neutron-lbaas or Octavia? I will assume you are using Octavia. When you add a member server (backend web server for example), you have a few options: 1. If you create a member without the "subnet_id" option, the load balancer will attempt to route to the member IP address over the VIP subnet. Health monitor checks will also follow this route. 2. If, when you create the member, you specify the "subnet_id" option to a valid neutron subnet, the load balancer will be attached to that subnet and will route to the member IP address. If you do not specify a "monitor_address", health monitoring will follow the same route as the member IP address. 3. If you create a member, with the "monitor_address" specified, traffic will be routed to the member IP address, but health monitoring checks will be directed to the "monitor_address". To give an example: Say you have a neutron network 436f58c2-0454-49dc-888e-eaafdd178577 with a subnet of e6e46e02-7768-4ae4-89c6-314c34557b5d with CIDR 100.64.0.0/14 on it. When creating the pool member is created you would specify something like: openstack loadbalancer member create --address 100.64.100.5 --subnet-id e6e46e02-7768-4ae4-89c6-314c34557b5d --protocol-port 80 This will attach the neutron network 436f58c2-0454-49dc-888e-eaafdd178577 to the load balancer and allocate an IP address on 100.64.0.0/14 that will be used to contact the member server address of 100.64.100.5. Health monitor checks will also follow this same path to the member server. We have some documentation for this in the cookbook here: https://docs.openstack.org/octavia/latest/user/guides/basic-cookbook.html#deploy-a-basic-http-load-balancer-with-a-health-monitor I hope this helps clarify, Michael On Tue, Jul 7, 2020 at 12:45 AM Zhengyu Pan wrote: > > > There are some private cloud or public cloud introduction: They use 100.64.0.0/14 network segments to check vm's health status in load balancer. In Region supporting VPC, load balancing private network IP and health check IP will be switched to 100 network segment. I can't understand how to implement it. How to do it? > > > > > > -- > > > > > From rosmaita.fossdev at gmail.com Wed Jul 8 21:14:51 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 8 Jul 2020 17:14:51 -0400 Subject: [ops][cinder] festival of EOL - ocata and pike Message-ID: Lee Yarwood recently announced the change to 'unmaintained' status of nova stable/ocata [0] and stable/pike [1] branches, with the clever idea of back-dating the 6 month period of un-maintenance to the most recent commit to each branch. I took a look at cinder stable/ocata and stable/pike, and the most recent commit to each is 8 months ago and 7 months ago, respectively. The Cinder team discussed this at today's Cinder meeting and agreed that this email will serve as notice to the OpenStack Community that the following openstack/cinder branches have been in 'unmaintained' status for the past 6 months: - stable/ocata - stable/pike The Cinder team hereby serves notice that it is our intent to ask the openstack infra team to tag each as EOL at its current HEAD and delete the branches two weeks from today, that is, on Wednesday, 22 July 2020. (This applies also to the other stable-branched cinder repositories, that is, os-brick, python-cinderclient, and python-cinderclient-extension.) Please see [2] for information about the maintenance phases and what action would need to occur before 22 July for a branch to be adopted back to the 'extended maintenance' phase. On behalf of the Cinder team, thank you for your attention to this matter. cheers, brian [0] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html [2] https://docs.openstack.org/project-team-guide/stable-branches.html From sunny at openstack.org Wed Jul 8 20:48:00 2020 From: sunny at openstack.org (Sunny Cai) Date: Wed, 8 Jul 2020 13:48:00 -0700 Subject: July OSF Community Meeting - 10 Years of OpenStack Message-ID: Hello everyone, You might have heard that OpenStack is turning 10 this year! On Thursday, July 16 at 8am PT (1500 UTC), we will be holding the 10 years of OpenStack virtual celebration in the July OSF community meeting. I have attached the calendar invite for the July OSF community meeting below. Grab your favorite OpenStack swag and bring your favorite drinks of choice to the meeting on July 16. Let’s do a virtual toast to the 10 incredible years! Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV If you have any questions, please let me know. P.S. To add more fun, feel free to try out the virtual background feature in Zoom. The 10 years of OpenStack virtual background is attached below. Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 10 Years of OpenStack Community Meeting meeting.ics Type: text/calendar Size: 1788 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 10 Years Virtual Background.jpg Type: image/jpeg Size: 530089 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Thu Jul 9 06:17:19 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Wed, 8 Jul 2020 23:17:19 -0700 Subject: OpenStack cluster event notification In-Reply-To: <59G5DQ.5J8F1FJXF7IT3@est.tech> References: <59G5DQ.5J8F1FJXF7IT3@est.tech> Message-ID: Thanks Julia for comments. Also thanks Gibi for the github link and sharing the example. I will take a look and adopt it. On Wed, Jul 8, 2020 at 5:05 AM Balázs Gibizer wrote: > > > On Tue, Jul 7, 2020 at 16:32, Julia Kreger > wrote: > [snip] > > > > > Although that being said, I don't think much would really prevent you > > from consuming the notifications directly from the message bus, if you > > so desire. Maybe someone already has some code for this on hand. > > Here is some example code that forwards the nova versioned > notifications from the message bus out to a client via websocket [1]. I > used this sample code in my demo [2] during a summit presentation. > > Cheers, > gibi > > [1] > > https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py > [2] https://www.youtube.com/watch?v=WFq5JWXa9AM > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.schaefer at cloudandheat.com Thu Jul 9 06:53:26 2020 From: jonas.schaefer at cloudandheat.com (Jonas =?ISO-8859-1?Q?Sch=E4fer?=) Date: Thu, 09 Jul 2020 08:53:26 +0200 Subject: [neutron] bandwidth metering based on remote address In-Reply-To: References: <2890841.xduM2AgYMW@antares> Message-ID: <25308951.foNqEPruJI@antares> Hello Rafael, On Dienstag, 7. Juli 2020 14:09:29 CEST Rafael Weingärtner wrote: > Hallo Jonas, > I have worked to address this specific use case. > > First, the part of the solution that is already implemented. If you only > need to gather metrics in a tenant fashion, you can take a look into this > PR: https://review.opendev.org/#/c/735605/. That pull request enables > operators to configure shared traffic labels, and then, these traffic > labels will be exposed/published with different granularities. The > different granularities are router, tenant, label, router-label, and > tenant-label. The complete explanation can be found in the "RST" document > that the PR also introduces, where we wrote a complete description of > neutron metering, its configs, and usage. You are welcome to review and > help us get this PR merged :) This already looks very useful to us, since it saves us from creating labels for each and every project. > So far, if all you need is to measure the whole traffic, but in different > granularities, that PR will probably be enough. Not quite; as mentioned, we’ll need to carve out specific network areas from metering, those which are in our DCs, but on the other side of the router from the customer perspective. > On the other hand, if you > need to create more complex rules to filter by source/destination IPs, then > we need something else. Interestingly enough, we are working towards that. > We will extend neutron API, and neutron metering to allow operators to use > "remote-ip" and "source-ip" to create metering labels rules. That sounds exactly like what we’d need. > We also saw the PR that changed the behavior of the "remote-ip" property, > and the whole confusion it caused (at least for us). However, instead of > proposing to revert it, we are working towards enabling the API to handle > "remote-ip" and "source-ip", which will cover the use case of the person > that introduced that commit, and many others such as ours and yours > (probably). Sounds good. Is there a way we can collaborate on this? Is there a launchpad bug which tracks that? (Also, is there a launchpad thing for the shared label granularity you’re doing already? I didn’t find one mentioned on the gerrit page.) kind regards, Jonas Schäfer > > On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < > > jonas.schaefer at cloudandheat.com> wrote: > > Dear list, > > > > We are trying to implement tenant bandwidth metering at the neutron router > > level. Since some of the network spaces connected to the external > > interface of > > the neutron router are supposed to be unmetered, we need to match on the > > remote address. > > > > Conveniently, there exists a --remote-ip-prefix option on meter label > > create; > > however, since [1], its meaning was changed to the exact opposite: Instead > > of > > matching on the *remote* prefix (towards the external interface), it > > matches > > on the *local* prefix (towards the OS tenant network). > > > > In an ideal world, we would want to revert that change and instead > > introduce a > > --local-ip-prefix option which covers that use-case. I suppose this is not > > a > > thing we *should* do though, given that this change made it into a few > > releases already. > > > > Instead, we’ll have to create a new option (which whatever name) + > > associated > > database schema + iptables rule patterns to implement the feature. > > > > The questions associated with this are now: > > > > - Does this make absolutely no sense to anyone? > > - What is the process for this? I suppose since this change was made > > intentionally and passed review, our desired change needs to go through a > > feature request process (blueprints maybe?). > > > > kind regards, > > Jonas Schäfer > > > > [1]: https://opendev.org/openstack/neutron/commit/ > > > > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > > > > -- > > Jonas Schäfer > > DevOps Engineer > > > > Cloud&Heat Technologies GmbH > > Königsbrücker Straße 96 | 01099 Dresden > > +49 351 479 367 37 > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > New Service: > > Managed Kubernetes designed for AI & ML > > https://managed-kubernetes.cloudandheat.com/ > > > > Commercial Register: District Court Dresden > > Register Number: HRB 30549 > > VAT ID No.: DE281093504 > > Managing Director: Nicolas Röhrs > > Authorized signatory: Dr. Marius Feldmann > > Authorized signatory: Kristina Rübenkamp -- Jonas Schäfer DevOps Engineer Cloud&Heat Technologies GmbH Königsbrücker Straße 96 | 01099 Dresden +49 351 479 367 37 jonas.schaefer at cloudandheat.com | www.cloudandheat.com New Service: Managed Kubernetes designed for AI & ML https://managed-kubernetes.cloudandheat.com/ Commercial Register: District Court Dresden Register Number: HRB 30549 VAT ID No.: DE281093504 Managing Director: Nicolas Röhrs Authorized signatory: Dr. Marius Feldmann Authorized signatory: Kristina Rübenkamp -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part. URL: From anilj.mailing at gmail.com Thu Jul 9 08:22:22 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Thu, 9 Jul 2020 01:22:22 -0700 Subject: Hardware requirement for OpenStack HA Cluster Message-ID: Hi All, I am looking for hardware requirements (CPU, RAM, HDD) for installing a OpenStack HA cluster. So far, I gathered few references: - This article talks about CPU and HDD, but they do not comment on RAM. - https://docs.openstack.org/project-deploy-guide/openstack-ansible/ocata/overview-requirements.html - This article talks about CPU, RAM, and HDD, but it is quite old (2015) reference. - https://docs.huihoo.com/openstack/docs.openstack.org/ha-guide/HAGuide.pdf (Page 6) I am considering the cluster with: 3 Controller (for HA) + 1 Compute + 1 Storage. I have following questions: - What is the minimum hardware (CPU, RAM, HDD) requirement to install a OpenStack HA cluster? - Can we have 3 Controller nodes installed on 3 Virtual Machines or do we need 3 independent (bare metal) servers? - So in case of VM-based controllers, the cluster will be hybrid in nature. - I do not know if this is even possible and a recommended design. - Do we need the Platform Director node in addition to controller and compute/storage nodes? Thanks in advance. Anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Jul 9 09:15:18 2020 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 9 Jul 2020 11:15:18 +0200 Subject: [Cinder] [Nova] Quiescing In-Reply-To: References: Message-ID: <20200709091518.r5usx2x3lnejvqmh@localhost> On 08/07, Marek Lyčka wrote: > Hi all, > > I'm currently looking into extending Nova API to allow on-demand VM > quiescing > with the ultimate goal being improved Cinder snapshot creation. The spec is > undergoing > review at the moment and I was wondering if someone from Cinder would be > kind > enough to look it over and give their thoughts on it: > > https://review.opendev.org/#/c/702810/ > > Thanks in advance. > Hi Marek, I'm really glad to hear somebody will be working on this functionality. I have reviewed the spec and Cinder (and probably anyone using the feature) needs a REST API to query the current state of the quiesce, unless the quiesce call is actually synchronous and doesn't return until it's done. Cheers, Gorka. > -- > Marek Lyčka > Linux Developer > > Ultimum Technologies a.s. > Na Poříčí 1047/26, 11000 Praha 1 > Czech Republic > > marek.lycka at ultimum.io > *https://ultimum.io * From skaplons at redhat.com Thu Jul 9 10:08:52 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:08:52 +0200 Subject: [neutron] PTL on vacation Message-ID: <86027844-7D61-4D04-9A14-559D56BBEDEA@redhat.com> Hi, For the next 2 weeks, starting Saturday 11th of July I will be on vacation without access to the irc and with very limited access to the email. Miguel Lavalle will run our team meetings during this time. With other things You can always ask one of our drivers [1] or lieutenants [1] https://review.opendev.org/#/admin/groups/464,members [2] https://docs.openstack.org/neutron/latest/contributor/policies/neutron-teams.html#neutron-lieutenants — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Thu Jul 9 10:43:41 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:43:41 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <1594028941528.18866@binero.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> Message-ID: <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Hi, Thx for Your feedback. > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: > > Hello Slawek, > This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. > > We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and > are looking forward into offloading a lot of work to OVN in the future. > > We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance > and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. > > From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind > the scenes in order to get to the desired point. > > * Geneve offloading on NIC, we might need new NICs or new firmware. > * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. Yes, that’s definitely something which will require more work. > * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? For octavia, there is ovn-octavia provider: https://opendev.org/openstack/ovn-octavia-provider which You can use with OVN instead of using Amphora > > Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html which should describe most of the gaps between ML2/OVS and ML2/OVN backends. We are working on closing those gaps but please also keep in mind that ML2/OVS is not going anywhere, if You need any of features from it, You can still use it as it still is and will be maintained backend :) > > But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. Thx. I really appreciate this. > > Best regards > Tobias > ________________________________________ > From: Slawek Kaplonski > Sent: Tuesday, June 23, 2020 12:24 PM > To: OpenStack Discuss ML > Cc: Assaf Muller; Daniel Alvarez Sanchez > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend > > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat > > > > — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Thu Jul 9 10:45:19 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 12:45:19 +0200 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: References: <20200623102448.eocahkszcd354b5d@skaplons-mac> Message-ID: <54AB1156-80B5-442A-8281-3C1165561926@redhat.com> Hi, > On 8 Jul 2020, at 02:18, Goutham Pacha Ravi wrote: > > > > > > On Tue, Jun 23, 2020 at 3:32 AM Slawek Kaplonski wrote: > Hi, > > The Neutron team wants to propose a switch of the default Neutron backend in > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to > OVN with its own ovn-metadata-agent and ovn-controller. > We discussed that change during the virtual PTG - see [1]. > In this document we want to explain reasons why we want to do that change. > > > OVN in 75 Words > --------------- > > Open Virtual Network is managed under the OVS project, and was created by the > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, > using lessons learned throughout the years. It is intended to be used in > projects such as OpenStack and Kubernetes. OVN has a different architecture, > moving us away from Python agents communicating with the Neutron API service > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > Here’s a heap of information about OpenStack’s integration of OVN: > * OpenStack Boston Summit talk on OVN [2] > * Upstream OpenStack networking-ovn documentation [3] and [4] > * OSP 13 OVN documentation, including how to install it using Director [5] > > Neutron OVN driver was developed as a Neutron stadium project, > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main > Neutron repository. > > > Why? > ---- > > In the Neutron team we believe that OVN and the Neutron OVN driver are built > with a modern architecture that offers better foundations for a simpler and > more performant solution. We see increased participation in kubernetes-ovn, > resulting in a larger core OVN community, and we would like OpenStack to > benefit from this Kubernetes driven OVN investment. > Neutron OVN driver currently has got some feature parity gaps comparing to > ML2/OVS (see [6] for details) but our team is working hard to close those gaps > and we believe that this driver is the future for Neutron and that’s why we > want to make it the default Neutron ML2 backend in the Devstack configuration. > > > What Does it Mean? > ------------------ > > Since most Openstack projects use Neutron in their CI and gate jobs, this > change has the potential for a large impact. > But this backend is already tested with various jobs in the Neutron CI and it > works fine. Recently (See [7]) we also proposed to add an OVN based job to the > Devstack’s check queue. > Similarly the default Neutron backend in TripleO was changed in the Stein cycle > and there were no any significant issues related strictly to this change. It > worked well for other projects. > Of course in the Neutron project we will be still gating other drivers, like > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of > some of the jobs. > The Neutron team is *NOT* going to deprecate any of the other existing ML2 > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > drivers in the same way as it is now. > > > Action Plan > ----------- > > We want to make this change before the Victoria-2 milestone to not make such > changes too late in the release cycle. Our action plan is as below: > > 1. Share the plan and get feedback from the upstream community (this thread) > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to > Devstack repo - we don’t want to force everyone else to add “enable_plugin > neutron” in their local.conf file to use default Neutron backend, > 3. Switch default Neutron backend in Devstack to be OVN, > a. Switch definition of base devstack CI jobs that it will run Neutron with > OVN backend, > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack > projects to check if it will not break anything in the gate of those projects. > > +1 This plan looks great. We test Neutron integration quite a bit in OpenStack Manila devstack jobs and in third party CI associated with the project. We've tested OVN in the past and noticed it made share server provisioning faster and more reliable. So I don't think we would be affected negatively should you change the default mechanism and driver. However, please keep us in mind, and perhaps alert me when you post patches so we can test everything is okay. We will for sure alert others to check this in their project when it will be ready. For now Lucas is still working on patches to move ovn bits to the Devstack repo. > > 5. If all will be running fine, merge patches proposed in points 3 and 3a. > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > [7] https://review.opendev.org/#/c/736021/ > > -- > Slawek Kaplonski > Senior software engineer > Red Hat — Slawek Kaplonski Principal software engineer Red Hat From cgoncalves at redhat.com Thu Jul 9 11:13:09 2020 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 9 Jul 2020 12:13:09 +0100 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Message-ID: On Thu, Jul 9, 2020 at 11:45 AM Slawek Kaplonski wrote: > Hi, > > Thx for Your feedback. > > > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: > > > > Hello Slawek, > > This is very interesting and I think this is the right way to go, > speakin from an operator standpoint here. > > > > We've started investing time in getting familiar with OVN, how to > operate and how to troubleshoot and > > are looking forward into offloading a lot of work to OVN in the future. > > > > We are closely looking how we can integrate hardware offloading with > OVN+OVS to improve our performance > > and in the future looking to the new VirtIO backend support for vDPA > that has started to mature more. > > > > From an operator's view, after getting familiar with OVN, there is a lot > of work that needs to be done behind > > the scenes in order to get to the desired point. > > > > * Geneve offloading on NIC, we might need new NICs or new firmware. > > * We need to migrate away from VXLAN to Geneve encapsulation, how can we > migrate our current baremetal approach > > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red > Hat has driven some work to perform this (an Geneve migration) but there is > minimal testing or real world deployments that has tried or documented the > approach. > > Yes, that’s definitely something which will require more work. > > > * And then all misc stuff, we need to look into the new > ovn-metadata-agent, should we move Octavia over to OVN yet? > > For octavia, there is ovn-octavia provider: > https://opendev.org/openstack/ovn-octavia-provider which You can use with > OVN instead of using Amphora > Before an attempt at moving from amphora to OVN load balancers, it's worth considering all the existing feature limitations of the OVN provider. OVN load balancers do not support a large feature set typically available in other load balancer solutions. For example, OVN does not support: - Round-robin, weighted round-robin, least connection, source IP, etc. It does only support one balancing algorithm: source IP-Port - HTTP, HTTPS, Proxy protocols. OVN only supports TCP and UDP with limited capabilities (e.g. no timeout knobs) - TLS termination - TLS client authentication - TLS backend encryption - Layer 7 features and header manipulation - Health monitors (WIP) - Octavia flavors - Statistics - Mixed IPv6 and IPv4 VIPs and members. More details in https://docs.openstack.org/octavia/latest/user/feature-classification/index.html > > > > > Then the final, what do we gain vs what do we lose in terms of > maintainability, performance and features. > > We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html > which should describe most of the gaps between ML2/OVS and ML2/OVN backends. > We are working on closing those gaps but please also keep in mind that > ML2/OVS is not going anywhere, if You need any of features from it, You can > still use it as it still is and will be maintained backend :) > > > > > But form an operator's view, I'm very positive to the future of a OVN > integrated OpenStack. > > Thx. I really appreciate this. > > > > > Best regards > > Tobias > > ________________________________________ > > From: Slawek Kaplonski > > Sent: Tuesday, June 23, 2020 12:24 PM > > To: OpenStack Discuss ML > > Cc: Assaf Muller; Daniel Alvarez Sanchez > > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron > Backend > > > > Hi, > > > > The Neutron team wants to propose a switch of the default Neutron > backend in > > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, > neutron-l3-agent) to > > OVN with its own ovn-metadata-agent and ovn-controller. > > We discussed that change during the virtual PTG - see [1]. > > In this document we want to explain reasons why we want to do that > change. > > > > > > OVN in 75 Words > > --------------- > > > > Open Virtual Network is managed under the OVS project, and was created > by the > > original authors of OVS. It is an attempt to re-do the ML2/OVS control > plane, > > using lessons learned throughout the years. It is intended to be used in > > projects such as OpenStack and Kubernetes. OVN has a different > architecture, > > moving us away from Python agents communicating with the Neutron API > service > > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. > > > > Here’s a heap of information about OpenStack’s integration of OVN: > > * OpenStack Boston Summit talk on OVN [2] > > * Upstream OpenStack networking-ovn documentation [3] and [4] > > * OSP 13 OVN documentation, including how to install it using Director > [5] > > > > Neutron OVN driver was developed as a Neutron stadium project, > > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into > the main > > Neutron repository. > > > > > > Why? > > ---- > > > > In the Neutron team we believe that OVN and the Neutron OVN driver are > built > > with a modern architecture that offers better foundations for a simpler > and > > more performant solution. We see increased participation in > kubernetes-ovn, > > resulting in a larger core OVN community, and we would like OpenStack to > > benefit from this Kubernetes driven OVN investment. > > Neutron OVN driver currently has got some feature parity gaps comparing > to > > ML2/OVS (see [6] for details) but our team is working hard to close > those gaps > > and we believe that this driver is the future for Neutron and that’s why > we > > want to make it the default Neutron ML2 backend in the Devstack > configuration. > > > > > > What Does it Mean? > > ------------------ > > > > Since most Openstack projects use Neutron in their CI and gate jobs, this > > change has the potential for a large impact. > > But this backend is already tested with various jobs in the Neutron CI > and it > > works fine. Recently (See [7]) we also proposed to add an OVN based job > to the > > Devstack’s check queue. > > Similarly the default Neutron backend in TripleO was changed in the > Stein cycle > > and there were no any significant issues related strictly to this > change. It > > worked well for other projects. > > Of course in the Neutron project we will be still gating other drivers, > like > > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the > names of > > some of the jobs. > > The Neutron team is *NOT* going to deprecate any of the other existing > ML2 > > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree > > drivers in the same way as it is now. > > > > > > Action Plan > > ----------- > > > > We want to make this change before the Victoria-2 milestone to not make > such > > changes too late in the release cycle. Our action plan is as below: > > > > 1. Share the plan and get feedback from the upstream community (this > thread) > > 2. Move OVN related Devstack code from a plugin defined in the Neutron > repo to > > Devstack repo - we don’t want to force everyone else to add > “enable_plugin > > neutron” in their local.conf file to use default Neutron backend, > > 3. Switch default Neutron backend in Devstack to be OVN, > > a. Switch definition of base devstack CI jobs that it will run Neutron > with > > OVN backend, > > 4. Propose DNM patches depend on patch from point 3 and 3a to main > OpenStack > > projects to check if it will not break anything in the gate of those > projects. > > 5. If all will be running fine, merge patches proposed in points 3 and > 3a. > > > > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - > 193 > > [2] https://www.youtube.com/watch?v=sgc7myiX6ts > > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html > > [4] https://docs.openstack.org/neutron/latest/ovn/index.html > > [5] > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ > > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html > > [7] https://review.opendev.org/#/c/736021/ > > > > -- > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > > > > > > > — > Slawek Kaplonski > Principal software engineer > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Thu Jul 9 11:53:29 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Thu, 9 Jul 2020 08:53:29 -0300 Subject: [neutron] bandwidth metering based on remote address In-Reply-To: <25308951.foNqEPruJI@antares> References: <2890841.xduM2AgYMW@antares> <25308951.foNqEPruJI@antares> Message-ID: I created a bug track for the extension of the neutron metering granularities: https://bugs.launchpad.net/neutron/+bug/1886949 I am never sure about those "paper work", I normally propose the pull requests, and wait for the guidance of the community. About the source/destination filtering, I have not published anything yet. So far, we defined/specified what we need/want from the Neutron metering sub-system. Next week I am supposed to start on this matter. Therefore, as soon as I have updates, I will create the bug report, and pull requests. You can help me now by reviewing the PR I already have open, and of course, testing/using it :) On Thu, Jul 9, 2020 at 3:54 AM Jonas Schäfer < jonas.schaefer at cloudandheat.com> wrote: > Hello Rafael, > > On Dienstag, 7. Juli 2020 14:09:29 CEST Rafael Weingärtner wrote: > > Hallo Jonas, > > I have worked to address this specific use case. > > > > First, the part of the solution that is already implemented. If you only > > need to gather metrics in a tenant fashion, you can take a look into this > > PR: https://review.opendev.org/#/c/735605/. That pull request enables > > operators to configure shared traffic labels, and then, these traffic > > labels will be exposed/published with different granularities. The > > different granularities are router, tenant, label, router-label, and > > tenant-label. The complete explanation can be found in the "RST" document > > that the PR also introduces, where we wrote a complete description of > > neutron metering, its configs, and usage. You are welcome to review and > > help us get this PR merged :) > > This already looks very useful to us, since it saves us from creating > labels > for each and every project. > > > So far, if all you need is to measure the whole traffic, but in different > > granularities, that PR will probably be enough. > > Not quite; as mentioned, we’ll need to carve out specific network areas > from > metering, those which are in our DCs, but on the other side of the router > from > the customer perspective. > > > On the other hand, if you > > need to create more complex rules to filter by source/destination IPs, > then > > we need something else. Interestingly enough, we are working towards > that. > > We will extend neutron API, and neutron metering to allow operators to > use > > "remote-ip" and "source-ip" to create metering labels rules. > > That sounds exactly like what we’d need. > > > We also saw the PR that changed the behavior of the "remote-ip" > property, > > and the whole confusion it caused (at least for us). However, instead of > > proposing to revert it, we are working towards enabling the API to handle > > "remote-ip" and "source-ip", which will cover the use case of the person > > that introduced that commit, and many others such as ours and yours > > (probably). > > Sounds good. Is there a way we can collaborate on this? Is there a > launchpad > bug which tracks that? (Also, is there a launchpad thing for the shared > label > granularity you’re doing already? I didn’t find one mentioned on the > gerrit > page.) > > kind regards, > Jonas Schäfer > > > > > On Tue, Jul 7, 2020 at 5:47 AM Jonas Schäfer < > > > > jonas.schaefer at cloudandheat.com> wrote: > > > Dear list, > > > > > > We are trying to implement tenant bandwidth metering at the neutron > router > > > level. Since some of the network spaces connected to the external > > > interface of > > > the neutron router are supposed to be unmetered, we need to match on > the > > > remote address. > > > > > > Conveniently, there exists a --remote-ip-prefix option on meter label > > > create; > > > however, since [1], its meaning was changed to the exact opposite: > Instead > > > of > > > matching on the *remote* prefix (towards the external interface), it > > > matches > > > on the *local* prefix (towards the OS tenant network). > > > > > > In an ideal world, we would want to revert that change and instead > > > introduce a > > > --local-ip-prefix option which covers that use-case. I suppose this is > not > > > a > > > thing we *should* do though, given that this change made it into a few > > > releases already. > > > > > > Instead, we’ll have to create a new option (which whatever name) + > > > associated > > > database schema + iptables rule patterns to implement the feature. > > > > > > The questions associated with this are now: > > > > > > - Does this make absolutely no sense to anyone? > > > - What is the process for this? I suppose since this change was made > > > intentionally and passed review, our desired change needs to go > through a > > > feature request process (blueprints maybe?). > > > > > > kind regards, > > > Jonas Schäfer > > > > > > [1]: https://opendev.org/openstack/neutron/commit/ > > > > > > 92db1d4a2c49b1f675b6a9552a8cc5a417973b64 > > > > > > > > > -- > > > Jonas Schäfer > > > DevOps Engineer > > > > > > Cloud&Heat Technologies GmbH > > > Königsbrücker Straße 96 | 01099 Dresden > > > +49 351 479 367 37 > > > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > > > > > New Service: > > > Managed Kubernetes designed for AI & ML > > > https://managed-kubernetes.cloudandheat.com/ > > > > > > Commercial Register: District Court Dresden > > > Register Number: HRB 30549 > > > VAT ID No.: DE281093504 > > > Managing Director: Nicolas Röhrs > > > Authorized signatory: Dr. Marius Feldmann > > > Authorized signatory: Kristina Rübenkamp > > > -- > Jonas Schäfer > DevOps Engineer > > Cloud&Heat Technologies GmbH > Königsbrücker Straße 96 | 01099 Dresden > +49 351 479 367 37 > jonas.schaefer at cloudandheat.com | www.cloudandheat.com > > New Service: > Managed Kubernetes designed for AI & ML > https://managed-kubernetes.cloudandheat.com/ > > Commercial Register: District Court Dresden > Register Number: HRB 30549 > VAT ID No.: DE281093504 > Managing Director: Nicolas Röhrs > Authorized signatory: Dr. Marius Feldmann > Authorized signatory: Kristina Rübenkamp > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Jul 9 12:40:11 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 9 Jul 2020 14:40:11 +0200 Subject: [neutron] Drivers meeting - agenda for 10.07.2020 Message-ID: <772B957B-86AB-4BD7-A594-60932F2A8304@redhat.com> Hi, For tomorrow’s meeting we have 2 RFEs to discuss: https://bugs.launchpad.net/neutron/+bug/1886798 - [RFE] Port NUMA affinity policy - this one was already discussed briefly few weeks ago - see http://eavesdrop.openstack.org/meetings/neutron_drivers/2020/neutron_drivers.2020-06-19-14.00.log.html#l-17 - but now as Rodolfo proposed official RFE, lets talk again about it, https://bugs.launchpad.net/neutron/+bug/1880532 - [RFE]L3 Router should support ECMP - this one was also discussed some time ago, owner of the rfe provided some additional info recently so please take a look into that and we will also discuss that tomorrow. Have a great day and see You on tomorrow’s meeting :) — Slawek Kaplonski Principal software engineer Red Hat From amuller at redhat.com Thu Jul 9 12:48:55 2020 From: amuller at redhat.com (Assaf Muller) Date: Thu, 9 Jul 2020 08:48:55 -0400 Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend In-Reply-To: References: <20200623102448.eocahkszcd354b5d@skaplons-mac> <1594028941528.18866@binero.com> <68ADA4CA-3C3B-440C-82B3-7F218750DE76@redhat.com> Message-ID: On Thu, Jul 9, 2020 at 7:17 AM Carlos Goncalves wrote: > > > > On Thu, Jul 9, 2020 at 11:45 AM Slawek Kaplonski wrote: >> >> Hi, >> >> Thx for Your feedback. >> >> > On 6 Jul 2020, at 11:49, Tobias Urdin wrote: >> > >> > Hello Slawek, >> > This is very interesting and I think this is the right way to go, speakin from an operator standpoint here. >> > >> > We've started investing time in getting familiar with OVN, how to operate and how to troubleshoot and >> > are looking forward into offloading a lot of work to OVN in the future. >> > >> > We are closely looking how we can integrate hardware offloading with OVN+OVS to improve our performance >> > and in the future looking to the new VirtIO backend support for vDPA that has started to mature more. >> > >> > From an operator's view, after getting familiar with OVN, there is a lot of work that needs to be done behind >> > the scenes in order to get to the desired point. >> > >> > * Geneve offloading on NIC, we might need new NICs or new firmware. >> > * We need to migrate away from VXLAN to Geneve encapsulation, how can we migrate our current baremetal approach >> > * We need to have Neutron migrate from ML2 OVS to ML2 OVN, I know Red Hat has driven some work to perform this (an Geneve migration) but there is minimal testing or real world deployments that has tried or documented the approach. >> >> Yes, that’s definitely something which will require more work. >> >> > * And then all misc stuff, we need to look into the new ovn-metadata-agent, should we move Octavia over to OVN yet? >> >> For octavia, there is ovn-octavia provider: https://opendev.org/openstack/ovn-octavia-provider which You can use with OVN instead of using Amphora > > > Before an attempt at moving from amphora to OVN load balancers, it's worth considering all the existing feature limitations of the OVN provider. > > OVN load balancers do not support a large feature set typically available in other load balancer solutions. For example, OVN does not support: > > - Round-robin, weighted round-robin, least connection, source IP, etc. It does only support one balancing algorithm: source IP-Port > - HTTP, HTTPS, Proxy protocols. OVN only supports TCP and UDP with limited capabilities (e.g. no timeout knobs) > - TLS termination > - TLS client authentication > - TLS backend encryption > - Layer 7 features and header manipulation > - Health monitors (WIP) > - Octavia flavors > - Statistics > - Mixed IPv6 and IPv4 VIPs and members. > > More details in https://docs.openstack.org/octavia/latest/user/feature-classification/index.html Exactly. The Amphora and OVN drivers: a) Can be loaded at the same time b) Users can choose which driver to use per LB c) Are complementary, they don't replace one another The intention is that you could use an OVN based LB for 'simple' use cases, where you don't require any of the functionality Carlos highlighted above, and Amphora for the rest. The assumption here is that for simple use cases OVN based LBs perform and scale better, though we haven't quite been able to confirm that yet. > >> >> >> > >> > Then the final, what do we gain vs what do we lose in terms of maintainability, performance and features. >> >> We have document https://docs.openstack.org/neutron/latest/ovn/gaps.html which should describe most of the gaps between ML2/OVS and ML2/OVN backends. >> We are working on closing those gaps but please also keep in mind that ML2/OVS is not going anywhere, if You need any of features from it, You can still use it as it still is and will be maintained backend :) >> >> > >> > But form an operator's view, I'm very positive to the future of a OVN integrated OpenStack. >> >> Thx. I really appreciate this. >> >> > >> > Best regards >> > Tobias >> > ________________________________________ >> > From: Slawek Kaplonski >> > Sent: Tuesday, June 23, 2020 12:24 PM >> > To: OpenStack Discuss ML >> > Cc: Assaf Muller; Daniel Alvarez Sanchez >> > Subject: [All][Neutron][Devstack] OVN as the Default Devstack Neutron Backend >> > >> > Hi, >> > >> > The Neutron team wants to propose a switch of the default Neutron backend in >> > Devstack from OVS (neutron-ovs-agent, neutron-dhcp-agent, neutron-l3-agent) to >> > OVN with its own ovn-metadata-agent and ovn-controller. >> > We discussed that change during the virtual PTG - see [1]. >> > In this document we want to explain reasons why we want to do that change. >> > >> > >> > OVN in 75 Words >> > --------------- >> > >> > Open Virtual Network is managed under the OVS project, and was created by the >> > original authors of OVS. It is an attempt to re-do the ML2/OVS control plane, >> > using lessons learned throughout the years. It is intended to be used in >> > projects such as OpenStack and Kubernetes. OVN has a different architecture, >> > moving us away from Python agents communicating with the Neutron API service >> > via RabbitMQ to C daemons communicating via OpenFlow and OVSDB. >> > >> > Here’s a heap of information about OpenStack’s integration of OVN: >> > * OpenStack Boston Summit talk on OVN [2] >> > * Upstream OpenStack networking-ovn documentation [3] and [4] >> > * OSP 13 OVN documentation, including how to install it using Director [5] >> > >> > Neutron OVN driver was developed as a Neutron stadium project, >> > "networking-ovn". In the Ussuri cycle, networking-ovn was merged into the main >> > Neutron repository. >> > >> > >> > Why? >> > ---- >> > >> > In the Neutron team we believe that OVN and the Neutron OVN driver are built >> > with a modern architecture that offers better foundations for a simpler and >> > more performant solution. We see increased participation in kubernetes-ovn, >> > resulting in a larger core OVN community, and we would like OpenStack to >> > benefit from this Kubernetes driven OVN investment. >> > Neutron OVN driver currently has got some feature parity gaps comparing to >> > ML2/OVS (see [6] for details) but our team is working hard to close those gaps >> > and we believe that this driver is the future for Neutron and that’s why we >> > want to make it the default Neutron ML2 backend in the Devstack configuration. >> > >> > >> > What Does it Mean? >> > ------------------ >> > >> > Since most Openstack projects use Neutron in their CI and gate jobs, this >> > change has the potential for a large impact. >> > But this backend is already tested with various jobs in the Neutron CI and it >> > works fine. Recently (See [7]) we also proposed to add an OVN based job to the >> > Devstack’s check queue. >> > Similarly the default Neutron backend in TripleO was changed in the Stein cycle >> > and there were no any significant issues related strictly to this change. It >> > worked well for other projects. >> > Of course in the Neutron project we will be still gating other drivers, like >> > ML2/Linuxbridge and ML2/OVS - nothing will change here, except for the names of >> > some of the jobs. >> > The Neutron team is *NOT* going to deprecate any of the other existing ML2 >> > drivers. We will be still maintaining Linuxbridge, OVS and other in-tree >> > drivers in the same way as it is now. >> > >> > >> > Action Plan >> > ----------- >> > >> > We want to make this change before the Victoria-2 milestone to not make such >> > changes too late in the release cycle. Our action plan is as below: >> > >> > 1. Share the plan and get feedback from the upstream community (this thread) >> > 2. Move OVN related Devstack code from a plugin defined in the Neutron repo to >> > Devstack repo - we don’t want to force everyone else to add “enable_plugin >> > neutron” in their local.conf file to use default Neutron backend, >> > 3. Switch default Neutron backend in Devstack to be OVN, >> > a. Switch definition of base devstack CI jobs that it will run Neutron with >> > OVN backend, >> > 4. Propose DNM patches depend on patch from point 3 and 3a to main OpenStack >> > projects to check if it will not break anything in the gate of those projects. >> > 5. If all will be running fine, merge patches proposed in points 3 and 3a. >> > >> > [1] https://etherpad.opendev.org/p/neutron-victoria-ptg - Lines 185 - 193 >> > [2] https://www.youtube.com/watch?v=sgc7myiX6ts >> > [3] https://docs.openstack.org/neutron/latest/admin/ovn/index.html >> > [4] https://docs.openstack.org/neutron/latest/ovn/index.html >> > [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/networking_with_open_virtual_network/ >> > [6] https://docs.openstack.org/neutron/latest/ovn/gaps.html >> > [7] https://review.opendev.org/#/c/736021/ >> > >> > -- >> > Slawek Kaplonski >> > Senior software engineer >> > Red Hat >> > >> > >> > >> > >> >> — >> Slawek Kaplonski >> Principal software engineer >> Red Hat >> >> From radoslaw.piliszek at gmail.com Thu Jul 9 14:32:16 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jul 2020 16:32:16 +0200 Subject: [kolla] Today is the first Kall Message-ID: Hiya Folks, today (09 Jul) is the first Kolla's Kall [1]. It starts at 15:00 UTC so in just a bit less than half an hour. (Sorry for the late reminder, these days don't spare me.) Today's agenda is based on one of the top priorities for Victoria - DA DOCS. We decided to give Meetpad a try. It does not record meetings but we will document the meeting on etherpad anyhow. The link is on the referenced wiki page. (The potential fallback will be Google Meet, we will update the wiki). Everyone is free to join. Kolla Kall is development-oriented and focuses on implementation discussion, change planning, release planning, housekeeping, etc. The expected audience is people interested in Kolla projects development, including Kolla, Kolla-Ansible and Kayobe. Looking forward to seeing YOU there. [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall -yoctozepto From arxcruz at redhat.com Thu Jul 9 15:14:58 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 17:14:58 +0200 Subject: [qa][tempest] Update language in tempest code base Message-ID: Hello, I would like to start a discussion regarding the topic. At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from tempest codebase, such as blacklist, whitelist. We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives. As you can see in [1] the TripleO project is already working on this initiative, and I would like to work on this as well on the tempest side. Any thoughts? Shall I start with a sepc, adding deprecation warnings? [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst Kind regards, -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Thu Jul 9 15:51:35 2020 From: emccormick at cirrusseven.com (Erik McCormick) Date: Thu, 9 Jul 2020 11:51:35 -0400 Subject: Hardware requirement for OpenStack HA Cluster In-Reply-To: References: Message-ID: On Thu, Jul 9, 2020 at 4:25 AM Anil Jangam wrote: > Hi All, > > I am looking for hardware requirements (CPU, RAM, HDD) for installing a > OpenStack HA cluster. > So far, I gathered few references: > > - This article talks about CPU and HDD, but they do not comment on > RAM. > - > https://docs.openstack.org/project-deploy-guide/openstack-ansible/ocata/overview-requirements.html > - This article talks about CPU, RAM, and HDD, but it is quite old > (2015) reference. > - > https://docs.huihoo.com/openstack/docs.openstack.org/ha-guide/HAGuide.pdf > (Page 6) > > I am considering the cluster with: 3 Controller (for HA) + 1 Compute + 1 > Storage. > > I have following questions: > > - What is the minimum hardware (CPU, RAM, HDD) requirement to install > a OpenStack HA cluster? > > For memory, you could probably get away with 16 GB on the controllers, but I would go at least 32. I have 64 in mine. My lightly loaded dev cluster sits at about 15GB used under light load. For a small cluster, I wouldn't go less than 4 cores. A single 8 core CPU will be plenty. If you think you're going to grow it and make heavy use of the APIs then double it. For HDD, you can get away with like 100 GB or even less, but you need to account for your Glance images assuming you're storing them locally. You'll also need space for logging if you're going ot deploy an ELK (or EFK) stack with it. Databases are fairly small. In a cluster with only a few compute nodes, they probably will be around 5 or 6 GB total. If you can throw a 1TB SSD at it, that should be plenty for a small cluster. > > - Can we have 3 Controller nodes installed on 3 Virtual Machines or do > we need 3 independent (bare metal) servers? > - So in case of VM-based controllers, the cluster will be hybrid in > nature. > - I do not know if this is even possible and a recommended design. > > I guess it depends on your threshold for failure. It seems to me to defeat the purpose of HA to stick everything on one physical box. It's certainly fine for testing / demonstration purposes. Is it supported? Sure. Is it recommended? No. > > - > - Do we need the Platform Director node in addition to controller and > compute/storage nodes? > > I am not familiar with OSA enough to say for sure, but I don't think so. You should be able to deploy with 'localhost' in your inventory as one of your controllers. You can also simply run the deployment from a linux VM on a laptop if you want. You shouldn't have to dedicate something. That being said, if you have a box where those things can live and be used repeatedly for reconfiguration and upgrade, it would probably make your life less complicated. > Thanks in advance. > Anil. > > > -Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 9 15:57:14 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 09 Jul 2020 10:57:14 -0500 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: References: Message-ID: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote ---- > Hello, > I would like to start a discussion regarding the topic. > At this moment in time we have an opportunity to be a more open and inclusive project by eliminating outdated naming conventions from tempest codebase, such as blacklist, whitelist.We should take the opportunity and do our best to replace outdated terms with their more inclusive alternatives.As you can see in [1] the TripleO project is already working on this initiative, and I would like to work on this as well on the tempest side. Thanks Arx for raising it. I always have hard time to understand the definition of 'outdated naming conventions ' are they outdated from coding language perspective or outdated as English language perspective? I do not see naming used in coding language should be matched with English as grammar/outdated/new style language. As long as they are not so bad (hurt anyone culture, abusing word etc) it is fine to keep them as it is and start adopting new names for new things we code. For me, naming convention are the things which always can be improved over time, none of the name is best suited for everyone in open source. But we need to understand whether it is worth to do in term of 1. effort of changing those 2. un- comfortness of adopting new names 3. again changing in future. At least from Tempest perspective, blacklist is very known common word used for lot of interfaces and dependent testing tool. I cannot debate on how good it is or bad but i can debate on not-worth to change now. For new interface, we can always use best-suggested name as per that time/culture/maintainers. We have tried few of such improvement in past but end up not-successful. Example: - https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a7f493b6086cab9/tempest/test.py#L43 -gmann > > Any thoughts? Shall I start with a sepc, adding deprecation warnings? > > [1] https://review.opendev.org/#/c/740013/1/specs/victoria/renaming_rules.rst > Kind regards, > > > -- > Arx Cruz > Software Engineer > Red Hat EMEA > arxcruz at redhat.com > @RedHat Red Hat Red Hat > From peljasz at yahoo.co.uk Thu Jul 9 16:00:22 2020 From: peljasz at yahoo.co.uk (lejeczek) Date: Thu, 9 Jul 2020 17:00:22 +0100 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> Message-ID: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Hi guys, I've packstaked a deployment with: CONFIG_CINDER_BACKEND=gluster CONFIG_CINDER_VOLUMES_CREATE=y CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs But after seemingly all work okey I keep getting: 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume Traceback (most recent call last): 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/cmd/volume.py", line 103, in _launch_service 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     cluster=cluster) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/service.py", line 400, in create 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     cluster=cluster, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/service.py", line 155, in __init__ 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     *args, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line 267, in __init__ 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     active_backend_id=curr_active_backend_id) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in import_object 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     return import_class(import_str)(*args, **kwargs) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume   File "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 30, in import_class 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume     __import__(mod_str) 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume ... I'm Centos 8 with "ussuri". Would you know & share a solution? many thanks, L. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 9 16:05:54 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 9 Jul 2020 18:05:54 +0200 Subject: [kolla] Today is the first Kall In-Reply-To: References: Message-ID: And it's a wrap! The first Kall was pretty successful and we left the notes in https://etherpad.opendev.org/p/kollakall Thanks all for joining the first Kall! See you next time! (In two weeks time) -yoctozepto On Thu, Jul 9, 2020 at 4:32 PM Radosław Piliszek wrote: > > Hiya Folks, > > today (09 Jul) is the first Kolla's Kall [1]. > It starts at 15:00 UTC so in just a bit less than half an hour. > (Sorry for the late reminder, these days don't spare me.) > > Today's agenda is based on one of the top priorities for Victoria - DA DOCS. > We decided to give Meetpad a try. It does not record meetings but we > will document the meeting on etherpad anyhow. > The link is on the referenced wiki page. > (The potential fallback will be Google Meet, we will update the wiki). > > Everyone is free to join. Kolla Kall is development-oriented and > focuses on implementation discussion, change planning, release > planning, housekeeping, etc. The expected audience is people > interested in Kolla projects development, including Kolla, > Kolla-Ansible and Kayobe. > > Looking forward to seeing YOU there. > > [1] https://wiki.openstack.org/wiki/Meetings/Kolla/Kall > > -yoctozepto From waboring at hemna.com Thu Jul 9 16:10:15 2020 From: waboring at hemna.com (Walter Boring) Date: Thu, 9 Jul 2020 12:10:15 -0400 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' In-Reply-To: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Message-ID: Glusterfs driver was deprecated in the Newton release and removed in the Ocata release. https://docs.openstack.org/releasenotes/cinder/ocata.html On Thu, Jul 9, 2020 at 12:05 PM lejeczek wrote: > Hi guys, > > I've packstaked a deployment with: > > CONFIG_CINDER_BACKEND=gluster > CONFIG_CINDER_VOLUMES_CREATE=y > CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs > > But after seemingly all work okey I keep getting: > > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume Traceback (most > recent call last): > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/cmd/volume.py", line 103, in > _launch_service > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > cluster=cluster) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/service.py", line 400, in create > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > cluster=cluster, **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/service.py", line 155, in __init__ > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume *args, > **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/cinder/volume/manager.py", line 267, in > __init__ > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > active_backend_id=curr_active_backend_id) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 44, in > import_object > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume return > import_class(import_str)(*args, **kwargs) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume File > "/usr/lib/python3.6/site-packages/oslo_utils/importutils.py", line 30, in > import_class > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > __import__(mod_str) > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ... > > I'm Centos 8 with "ussuri". > Would you know & share a solution? > > many thanks, L. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ltoscano at redhat.com Thu Jul 9 16:13:14 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 09 Jul 2020 18:13:14 +0200 Subject: RDO - ModuleNotFoundError: No module named 'cinder.volume.drivers.glusterfs' In-Reply-To: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> References: <7f7de2b5-c14a-31a0-4be1-009a9c09af6e.ref@yahoo.co.uk> <7f7de2b5-c14a-31a0-4be1-009a9c09af6e@yahoo.co.uk> Message-ID: <3161947.k3LOHGUjKi@whitebase.usersys.redhat.com> On Thursday, 9 July 2020 18:00:22 CEST lejeczek wrote: > Hi guys, > > I've packstaked a deployment with: > > CONFIG_CINDER_BACKEND=gluster > CONFIG_CINDER_VOLUMES_CREATE=y > CONFIG_CINDER_GLUSTER_MOUNTS=127.0.0.1:/VMs > > But after seemingly all work okey I keep getting: > > [...] > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ModuleNotFoundError: No module named > 'cinder.volume.drivers.glusterfs' > 2020-07-09 16:18:03.157 2547017 ERROR cinder.cmd.volume > ... > > I'm Centos 8 with "ussuri". > Would you know & share a solution? The glusterfs volume driver for cinder was deprecated in the newton release and removed during the pike cycle: https://review.opendev.org/#/c/377028/ There is still a glusterfs *backup* driver, not sure about its status though. -- Luigi From ltoscano at redhat.com Thu Jul 9 16:15:11 2020 From: ltoscano at redhat.com (Luigi Toscano) Date: Thu, 09 Jul 2020 18:15:11 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> Message-ID: <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote > ---- > > Hello, > > I would like to start a discussion regarding the topic. > > At this moment in time we have an opportunity to be a more open and > > inclusive project by eliminating outdated naming conventions from > > tempest codebase, such as blacklist, whitelist.We should take the > > opportunity and do our best to replace outdated terms with their more > > inclusive alternatives.As you can see in [1] the TripleO project is > > already working on this initiative, and I would like to work on this as > > well on the tempest side. > Thanks Arx for raising it. > > I always have hard time to understand the definition of 'outdated naming > conventions ' are they outdated from coding language perspective or > outdated as English language perspective? I do not see naming used in > coding language should be matched with English as grammar/outdated/new > style language. As long as they are not so bad (hurt anyone culture, > abusing word etc) it is fine to keep them as it is and start adopting new > names for new things we code. > > For me, naming convention are the things which always can be improved over > time, none of the name is best suited for everyone in open source. But we > need to understand whether it is worth to do in term of 1. effort of > changing those 2. un- comfortness of adopting new names 3. again changing > in future. > > At least from Tempest perspective, blacklist is very known common word used > for lot of interfaces and dependent testing tool. I cannot debate on how > good it is or bad but i can debate on not-worth to change now. For new > interface, we can always use best-suggested name as per that > time/culture/maintainers. We have tried few of such improvement in past but > end up not-successful. Example: - > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > 7f493b6086cab9/tempest/test.py#L43 > That's not the only used terminology for list of things, though. We could always add new interfaces and keep the old ones are deprecated (but not advertised) for the foreseable future. The old code won't be broken and the new one would use the new terminology, I'd say it's a good solution. -- Luigi From elod.illes at est.tech Thu Jul 9 16:27:21 2020 From: elod.illes at est.tech (=?UTF-8?B?RWzFkWQgSWxsw6lz?=) Date: Thu, 9 Jul 2020 18:27:21 +0200 Subject: [ops][cinder] festival of EOL - ocata and pike In-Reply-To: References: Message-ID: <8225c61e-687c-0116-da07-52443f315e43@est.tech> Hi, Sorry for sticking my nose into this thread (again o:)), just a couple of thoughts: - we had a rough month with failing Devstack and Tempest (and other) jobs, but thanks to Gmann and others we could fix most of the issues (except Tempest in Ocata, that's why it is announced generally as Unmaintained [0]) - this added some extra time to show a branch as unmaintained - branches in extended maintenance are not that busy branches, but still, I see some bugfix backports coming in even in Pike (in spite of failing gate in the last month) - Lee announced nova's Unmaintained state in the same circumstances, as we just fixed Pike's devstack - and I also sent a reply that I will continue to maintain nova's stable/pike as it is getting in a better shape now Last but not least: in cinder, there are "Zuul +1"d gate fixes both for Pike [1] (and Queens [2]), so it's not that hopeless. I don't want to keep a broken branch open in any cost, but does it cost that much? I mean, if there is the possibility to push a fix, why don't we let it happen? Right now Cinder Pike's gate seems working (with the fix, which needs an approve [1]). My suggestion is that let Pike still be in Extended Maintenance as it is still have a working gate ([1]) and EOL Ocata as it was already about to happen according to the mail thread [0], if necessary. Also, please check the steps in 'End of Life' chapter of the stable guideline [3] and let me offer my help if you need it for the transition. Cheers, Előd [0] http://lists.openstack.org/pipermail/openstack-discuss/2020-May/thread.html#15112 [1] https://review.opendev.org/#/c/737094/ [2] https://review.opendev.org/#/c/737093/ [3] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life On 2020. 07. 08. 23:14, Brian Rosmaita wrote: > Lee Yarwood recently announced the change to 'unmaintained' status of > nova stable/ocata [0] and stable/pike [1] branches, with the clever > idea of back-dating the 6 month period of un-maintenance to the most > recent commit to each branch.  I took a look at cinder stable/ocata > and stable/pike, and the most recent commit to each is 8 months ago > and 7 months ago, respectively. > > The Cinder team discussed this at today's Cinder meeting and agreed > that this email will serve as notice to the OpenStack Community that > the following openstack/cinder branches have been in 'unmaintained' > status for the past 6 months: > - stable/ocata > - stable/pike > > The Cinder team hereby serves notice that it is our intent to ask the > openstack infra team to tag each as EOL at its current HEAD and delete > the branches two weeks from today, that is, on Wednesday, 22 July 2020. > > (This applies also to the other stable-branched cinder repositories, > that is, os-brick, python-cinderclient, and > python-cinderclient-extension.) > > Please see [2] for information about the maintenance phases and what > action would need to occur before 22 July for a branch to be adopted > back to the 'extended maintenance' phase. > > On behalf of the Cinder team, thank you for your attention to this > matter. > > > cheers, > brian > > > [0] > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015747.html > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015798.html > [2] https://docs.openstack.org/project-team-guide/stable-branches.html > From arxcruz at redhat.com Thu Jul 9 16:45:19 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 18:45:19 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> Message-ID: Yes, that's the idea. We can keep the old interface for a few cycles, with warning deprecation message advertising to use the new one, and then remove in the future. Kind regards, On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano wrote: > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz > wrote > > ---- > > > Hello, > > > I would like to start a discussion regarding the topic. > > > At this moment in time we have an opportunity to be a more open and > > > inclusive project by eliminating outdated naming conventions from > > > tempest codebase, such as blacklist, whitelist.We should take the > > > opportunity and do our best to replace outdated terms with their more > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > already working on this initiative, and I would like to work on this > as > > > well on the tempest side. > > Thanks Arx for raising it. > > > > I always have hard time to understand the definition of 'outdated naming > > conventions ' are they outdated from coding language perspective or > > outdated as English language perspective? I do not see naming used in > > coding language should be matched with English as grammar/outdated/new > > style language. As long as they are not so bad (hurt anyone culture, > > abusing word etc) it is fine to keep them as it is and start adopting new > > names for new things we code. > > > > For me, naming convention are the things which always can be improved > over > > time, none of the name is best suited for everyone in open source. But we > > need to understand whether it is worth to do in term of 1. effort of > > changing those 2. un- comfortness of adopting new names 3. again changing > > in future. > > > > At least from Tempest perspective, blacklist is very known common word > used > > for lot of interfaces and dependent testing tool. I cannot debate on how > > good it is or bad but i can debate on not-worth to change now. For new > > interface, we can always use best-suggested name as per that > > time/culture/maintainers. We have tried few of such improvement in past > but > > end up not-successful. Example: - > > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > 7f493b6086cab9/tempest/test.py#L43 > > > > That's not the only used terminology for list of things, though. We could > always add new interfaces and keep the old ones are deprecated (but not > advertised) for the foreseable future. The old code won't be broken and > the > new one would use the new terminology, I'd say it's a good solution. > > > -- > Luigi > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 9 16:50:35 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 9 Jul 2020 18:50:35 +0200 Subject: [TripleO]Documentation to list all options in yaml file and possible values Message-ID: Hi all, 1) Is there a page or a draft, where all options of TripleO are available? 2) Is there a page or a draft, where dependencies of each option are listed? 3) Is there a page or a draft, where all possible values for each option would be listed? -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jul 9 17:06:39 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 09 Jul 2020 12:06:39 -0500 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> Message-ID: <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> ---- On Thu, 09 Jul 2020 11:45:19 -0500 Arx Cruz wrote ---- > Yes, that's the idea. > We can keep the old interface for a few cycles, with warning deprecation message advertising to use the new one, and then remove in the future. Deprecating things leads to two situations which really need some good reason before doing it: - If we keep the deprecated interfaces working along with new interfaces then it is confusion for users as well as maintenance effort. In my experience, very less migration happen to new things if old keep working. - If we remove them in future then it is breaking change. IMO, we need to first ask/analyse whether name changes are worth to do with above things as results. Or in other team we should first define what is 'outdated naming conventions' and how worth to fix those. -gmann > Kind regards, > > On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano wrote: > > > -- > Arx Cruz > Software Engineer > Red Hat EMEA > arxcruz at redhat.com > @RedHat Red Hat Red Hat > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz wrote > > ---- > > > Hello, > > > I would like to start a discussion regarding the topic. > > > At this moment in time we have an opportunity to be a more open and > > > inclusive project by eliminating outdated naming conventions from > > > tempest codebase, such as blacklist, whitelist.We should take the > > > opportunity and do our best to replace outdated terms with their more > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > already working on this initiative, and I would like to work on this as > > > well on the tempest side. > > Thanks Arx for raising it. > > > > I always have hard time to understand the definition of 'outdated naming > > conventions ' are they outdated from coding language perspective or > > outdated as English language perspective? I do not see naming used in > > coding language should be matched with English as grammar/outdated/new > > style language. As long as they are not so bad (hurt anyone culture, > > abusing word etc) it is fine to keep them as it is and start adopting new > > names for new things we code. > > > > For me, naming convention are the things which always can be improved over > > time, none of the name is best suited for everyone in open source. But we > > need to understand whether it is worth to do in term of 1. effort of > > changing those 2. un- comfortness of adopting new names 3. again changing > > in future. > > > > At least from Tempest perspective, blacklist is very known common word used > > for lot of interfaces and dependent testing tool. I cannot debate on how > > good it is or bad but i can debate on not-worth to change now. For new > > interface, we can always use best-suggested name as per that > > time/culture/maintainers. We have tried few of such improvement in past but > > end up not-successful. Example: - > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > 7f493b6086cab9/tempest/test.py#L43 > > > > That's not the only used terminology for list of things, though. We could > always add new interfaces and keep the old ones are deprecated (but not > advertised) for the foreseable future. The old code won't be broken and the > new one would use the new terminology, I'd say it's a good solution. > > > -- > Luigi > > > From fungi at yuggoth.org Thu Jul 9 17:26:23 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 9 Jul 2020 17:26:23 +0000 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> Message-ID: <20200709172622.kbl2pdkycjp6gouv@yuggoth.org> On 2020-07-09 10:57:14 -0500 (-0500), Ghanshyam Mann wrote: [...] > I always have hard time to understand the definition of 'outdated > naming conventions' are they outdated from coding language > perspective or outdated as English language perspective? [...] It's a recently popular euphemism for words which make people uncomfortable. Unfortunately, rather than addressing the problem head on and admitting that's the primary driver for the change, it has become preferable to pretend that's not the impetus for wholesale replacements of established terminology (often in an attempt to avoid heated debate over the value of such changes). Don't get me wrong, I think it's entirely reasonable to replace words or phrases which make people uncomfortable, and in many cases it's an opportunity to improve our terminology by using words which have direct meaning rather than relying on computer science jargon based on idiom and loose analogy. Even if this comes at the cost of some engineering effort, it can be a long-term improvement. But let's not kid ourselves, we're replacing words because they're deemed offensive. It's disingenuous, even potentially insulting, to imply otherwise. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From arxcruz at redhat.com Thu Jul 9 17:34:55 2020 From: arxcruz at redhat.com (Arx Cruz) Date: Thu, 9 Jul 2020 19:34:55 +0200 Subject: [qa][tempest] Update language in tempest code base In-Reply-To: <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> References: <173344b91ec.122943da3630997.4524106110681904507@ghanshyammann.com> <2383106.3VsfAaAtOV@whitebase.usersys.redhat.com> <173348b1df7.b5898c11633886.9175405555090897907@ghanshyammann.com> Message-ID: Well, at some point, it needs to break :) I was for a long time maintainer of gnome modules, more specifically zenity and in order to move forward with some functionalities we had to break stuff. We could not keep legacy code and move forward with new functionalities, and the gnome strategy is pretty simple: minor version, you must maintain api compatibility. Major version, let's break everything! The user can either stay in the version X.y.z, or update their code to version X+1.y.z. That's exactly what happened when gnome/gtk released the 3.x version, and what will happen with the future 4.x version. So, it's very hard to try new things, when you must maintain forever old things. The naming is for some people a problem, and we should make an effort to change that. Sometimes we don't see this as an issue, because it is so deeply rooted in our lives, that we don't see it as a problem. I'll give you an example we have in Brazil: One of the biggest children authors, known as Monteiro Lobato [1], was a very racist person, and he put all his racism in books, the books we have to read at school. So, in one of his famous books he has this character called Tia Anastácia, and another one the smart one called Pedrinho. So, Pedrinho always calls Tia Anastácia as: "That black lady" or: She is as black as a Gorilla, and people thought this was fine, and funny. And it was an official lecture in schools in Brazil, and even had a TV Show about it. I was one of those who watched and read those books, and always thought this was OKAY. Today, my daughter will never read Monteiro Lobato, and hopefully she will understand that is wrong if people call you "black as a Gorilla", no matter the context. Now, imagine you grow up reading these stories, how would you feel? ;) This is also right in code, you might not care, but there are people who are very sensible to some naming convention. Master/Slave may sound uncomfortable. Specially for people who have 400 years of slavery in their history. As an open source community, we should be able to fight against this, and make it a good code and environment for people who are new, and want to contribute, but not feel comfortable with some naming convention. You might say there's no such thing, but trust me they exist, and we should be working to make these people comfortable and welcome to our community. It's not about breaking code, it's about fixing it :) 1 - https://en.wikipedia.org/wiki/Monteiro_Lobato Kind regards, On Thu, Jul 9, 2020 at 7:06 PM Ghanshyam Mann wrote: > ---- On Thu, 09 Jul 2020 11:45:19 -0500 Arx Cruz > wrote ---- > > Yes, that's the idea. > > We can keep the old interface for a few cycles, with warning > deprecation message advertising to use the new one, and then remove in the > future. > > Deprecating things leads to two situations which really need some good > reason before doing it: > > - If we keep the deprecated interfaces working along with new interfaces > then it is confusion for users > as well as maintenance effort. In my experience, very less migration > happen to new things if old keep working. > > - If we remove them in future then it is breaking change. > > IMO, we need to first ask/analyse whether name changes are worth to do > with above things as results. Or in other > team we should first define what is 'outdated naming conventions' and how > worth to fix those. > > -gmann > > > > Kind regards, > > > > On Thu, Jul 9, 2020 at 6:15 PM Luigi Toscano > wrote: > > > > > > -- > > Arx Cruz > > Software Engineer > > Red Hat EMEA > > arxcruz at redhat.com > > @RedHat Red > Hat Red Hat > > > On Thursday, 9 July 2020 17:57:14 CEST Ghanshyam Mann wrote: > > > ---- On Thu, 09 Jul 2020 10:14:58 -0500 Arx Cruz > wrote > > > ---- > > > > Hello, > > > > I would like to start a discussion regarding the topic. > > > > At this moment in time we have an opportunity to be a more open and > > > > inclusive project by eliminating outdated naming conventions from > > > > tempest codebase, such as blacklist, whitelist.We should take the > > > > opportunity and do our best to replace outdated terms with their > more > > > > inclusive alternatives.As you can see in [1] the TripleO project is > > > > already working on this initiative, and I would like to work on > this as > > > > well on the tempest side. > > > Thanks Arx for raising it. > > > > > > I always have hard time to understand the definition of 'outdated > naming > > > conventions ' are they outdated from coding language perspective or > > > outdated as English language perspective? I do not see naming used in > > > coding language should be matched with English as grammar/outdated/new > > > style language. As long as they are not so bad (hurt anyone culture, > > > abusing word etc) it is fine to keep them as it is and start adopting > new > > > names for new things we code. > > > > > > For me, naming convention are the things which always can be improved > over > > > time, none of the name is best suited for everyone in open source. > But we > > > need to understand whether it is worth to do in term of 1. effort of > > > changing those 2. un- comfortness of adopting new names 3. again > changing > > > in future. > > > > > > At least from Tempest perspective, blacklist is very known common > word used > > > for lot of interfaces and dependent testing tool. I cannot debate on > how > > > good it is or bad but i can debate on not-worth to change now. For new > > > interface, we can always use best-suggested name as per that > > > time/culture/maintainers. We have tried few of such improvement in > past but > > > end up not-successful. Example: - > > > > https://opendev.org/openstack/tempest/src/commit/e1eebfa8451d4c28bef0669e4a > > > 7f493b6086cab9/tempest/test.py#L43 > > > > > > > That's not the only used terminology for list of things, though. We > could > > always add new interfaces and keep the old ones are deprecated (but not > > advertised) for the foreseable future. The old code won't be broken and > the > > new one would use the new terminology, I'd say it's a good solution. > > > > > > -- > > Luigi > > > > > > > > -- Arx Cruz Software Engineer Red Hat EMEA arxcruz at redhat.com @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From joh.scheuer at gmail.com Thu Jul 9 15:35:57 2020 From: joh.scheuer at gmail.com (Johannes Scheuermann) Date: Thu, 9 Jul 2020 17:35:57 +0200 Subject: Neutron Agent Migration Message-ID: Hi together, currently we exploring how we can reboot a compute node without any interruptions for the networking stack. We run Openstack Train with ml2 driver Linux bridge and dnsmasq for DHCP and internal DNS. The DHCP setup runs as high availability setup with 3 replicas. During our tests we identified the following challenges: 1.) If we reboot the machine without doing anything on the network layer all ports will be rescheduled. Also the networks will be removed from the (dead) agent and will be reassigned to another agent. But for each reboot we have some leftover ports with the device-id "reserved_dhcp_port". These ports can safely deleted (we haven't figured out where the issue in the neutron code is). 2.) If we disable the network agent like described here: https://docs.openstack.org/neutron/train/admin/config-dhcp-ha.html and then remove the disabled agent from all networks we have an even worse behaviour since the neutron scheduler doesn't reschedule the network to a different agent. So what is the correct way to ensure that the reboot of a node has no (or only small) interruptions to the networking service? The current issue is that if we remove one agent we might remove the port that is the first entry in the clients (VM's) resolv.conf which means that each request will be delayed by the default timeout. And is there any option to "migrate" a network from one agent to another? Thanks in advance, Johannes Scheuermann From zhangbailin at inspur.com Fri Jul 10 02:11:00 2020 From: zhangbailin at inspur.com (=?gb2312?B?QnJpbiBaaGFuZyjVxbDZwdYp?=) Date: Fri, 10 Jul 2020 02:11:00 +0000 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver Message-ID: Hi all: This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as follows: 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From yumeng_bao at yahoo.com Fri Jul 10 05:37:04 2020 From: yumeng_bao at yahoo.com (yumeng bao) Date: Fri, 10 Jul 2020 13:37:04 +0800 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> Message-ID: <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> Brin, thanks for bringing this up! > Hi all: > This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as follows: > 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; > 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. Providing test result can be our option. The test result can be part of the driver documentation[0] as this is public to users. And from my understanding, the test result should work as the role of tempest case and clarify at least: necessary configuration,test operations and test results. [0] https://docs.openstack.org/cyborg/latest/reference/support-matrix.html#driver-support > [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html Regards, Yumeng From gouthampravi at gmail.com Fri Jul 10 05:37:35 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 9 Jul 2020 22:37:35 -0700 Subject: [manila][stable] moving some stable branches to EOL Message-ID: Hello Stackers, There have been no changes to the stable/ocata [1], driverfixes/mitaka [2], driverfixes/newton [3] and driverfixes/ocata [4] branches of openstack/manila in a year [1] and the manila team today decided [2] that it was time to close this branches. While we routinely get requests from users, vendors and distributions to backport bug fixes to older releases, no one seems to want any further changes in these branches. We'd also like stable/pike to be EOL'ed, the last change to that branch was a CVE fix made three months ago. Keeping these branches open may give the impression that we'd continue to take backports in, and support them with bugfixes, when the reality is that we're struggling to keep meaningful testing in stable/queens and stable/rocky branches - something we've seen most bugfix/backport requests for. If there are no objections, I'll propose an EOL patch and request the infra team to help delete these branches. Thanks, Goutham [1] https://opendev.org/openstack/manila/commits/branch/stable/ocata [2] https://opendev.org/openstack/manila/src/branch/driverfixes/mitaka [3] https://opendev.org/openstack/manila/src/branch/driverfixes/newton [4] https://opendev.org/openstack/manila/src/branch/driverfixes/ocata [2] http://eavesdrop.openstack.org/meetings/manila/2020/manila.2020-07-09-15.01.log.html#l-80 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Fri Jul 10 07:18:18 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 10 Jul 2020 09:18:18 +0200 Subject: Neutron Agent Migration In-Reply-To: References: Message-ID: <06314D7F-366B-4E64-95E7-4F979D315512@redhat.com> Hi, > On 9 Jul 2020, at 17:35, Johannes Scheuermann wrote: > > Hi together, > > currently we exploring how we can reboot a compute node without any interruptions for the networking stack. > We run Openstack Train with ml2 driver Linux bridge and dnsmasq for DHCP and internal DNS. > The DHCP setup runs as high availability setup with 3 replicas. > During our tests we identified the following challenges: > > 1.) > > If we reboot the machine without doing anything on the network layer all ports will be rescheduled. > Also the networks will be removed from the (dead) agent and will be reassigned to another agent. > But for each reboot we have some leftover ports with the device-id "reserved_dhcp_port". > These ports can safely deleted (we haven't figured out where the issue in the neutron code is). It’s done here https://opendev.org/openstack/neutron/src/branch/master/neutron/db/agentschedulers_db.py#L419 and it’s done by purpose. The issue may be that this reserved port should be used on the new agent so we should check why it isn’t and why new port is created for new agent. > > 2.) > > If we disable the network agent like described here: https://docs.openstack.org/neutron/train/admin/config-dhcp-ha.html > and then remove the disabled agent from all networks we have an even worse behaviour since the neutron scheduler doesn't reschedule the network to a different agent. > > So what is the correct way to ensure that the reboot of a node has no (or only small) interruptions to the networking service? > The current issue is that if we remove one agent we might remove the port that is the first entry in the clients (VM's) resolv.conf which means that each request will be delayed by the default timeout. > > And is there any option to "migrate" a network from one agent to another? You can manually remove networks from one agent with command like: $ neutron dhcp-agent-network-remove And then add it to the new one with: $ neutron dhcp-agent-network-add > > Thanks in advance, > > Johannes Scheuermann > > — Slawek Kaplonski Principal software engineer Red Hat From skaplons at redhat.com Fri Jul 10 08:16:12 2020 From: skaplons at redhat.com (Slawek Kaplonski) Date: Fri, 10 Jul 2020 10:16:12 +0200 Subject: [neutron] Team meeting - day change proposal Message-ID: Hi, During our last Monday team meeting I proposed to cancel this bi-weekly Monday meeting and have weekly meeting on Tuesday always. It is officially proposed in https://review.opendev.org/#/c/739780/ - if You usually attends those meetings and You didn’t check it yet, please do and give +1 if You are ok with such change. Thx in advance. — Slawek Kaplonski Principal software engineer Red Hat From tonyppe at gmail.com Fri Jul 10 08:18:53 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Fri, 10 Jul 2020 16:18:53 +0800 Subject: [magnum] failed to launch Kubernetes cluster Message-ID: Hi team, I hope you are all keeping safe and well at the moment. I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ I have the same log messages on the master node within heat. When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. On the master, after the failure the heat log repeats these logs: ++ curl --silent http://127.0.0.1:8080/healthz + '[' ok = ok ']' + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable Trying to label master node with node-role.kubernetes.io/master="" + echo 'Trying to label master node with node-role.kubernetes.io/master=""' + sleep 5s [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? To create the cluster template I have been using: openstack coe cluster template create k8s-cluster-template \ --image Fedora-Atomic-27 \ --keypair testpair \ --external-network physnet2vlan20 \ --dns-nameserver 192.168.7.233 \ --flavor 2GB-2vCPU \ --docker-volume-size 15 \ --network-driver flannel \ --coe kubernetes If I have missed anything, I am happy to provide it. Many thanks in advance for any help or pointers on this. Regards, Tony Pearce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Fri Jul 10 08:24:34 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Fri, 10 Jul 2020 09:24:34 +0100 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: References: Message-ID: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Hi Tony That is a known issue and is due to the default version of heat container agent baked into Train release. Please use label heat_container_agent_tag=train-stable-3 and you should be good to go. Cheers Bharat > On 10 Jul 2020, at 09:18, Tony Pearce wrote: > > Hi team, I hope you are all keeping safe and well at the moment. > > I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. > > I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. > > This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. > Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: > > TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml > > TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. > > When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): > https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ > > I have the same log messages on the master node within heat. > > When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. > > When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ > > I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. > Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. > > On the master, after the failure the heat log repeats these logs: > > ++ curl --silent http://127.0.0.1:8080/healthz > + '[' ok = ok ']' > + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master ": ""}}}' > error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable > Trying to label master node with node-role.kubernetes.io/master= "" > + echo 'Trying to label master node with node-role.kubernetes.io/master= ""' > + sleep 5s > > [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ > [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ > > May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? > > To create the cluster template I have been using: > openstack coe cluster template create k8s-cluster-template \ > --image Fedora-Atomic-27 \ > --keypair testpair \ > --external-network physnet2vlan20 \ > --dns-nameserver 192.168.7.233 \ > --flavor 2GB-2vCPU \ > --docker-volume-size 15 \ > --network-driver flannel \ > --coe kubernetes > > > If I have missed anything, I am happy to provide it. > > Many thanks in advance for any help or pointers on this. > > Regards, > > Tony Pearce > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Fri Jul 10 12:32:24 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Fri, 10 Jul 2020 07:32:24 -0500 Subject: [tc] [all] Topics for Cross Community Discussion with Kubernetes ... Message-ID: All, Recently, the OpenStack TC has reached out to the Kubernetes Steering Committee for input as we have proposed adding a starter-kit:kubernetes-in-virt tag for projects in OpenStack. This request was received positively and as a result the TC has started brainstorming other topics that we could approach with the k8s community in this [1] etherpad. If you have topics that may be appropriate for this discussion please see the etherpad and add your ideas. Thanks! Jay IRC: jungleboyj [1] https://etherpad.opendev.org/p/kubernetes-cross-community-topics From smooney at redhat.com Fri Jul 10 12:58:15 2020 From: smooney at redhat.com (Sean Mooney) Date: Fri, 10 Jul 2020 13:58:15 +0100 Subject: [cyborg] Temporary treatment plan for the 3rd-party driver In-Reply-To: <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> Message-ID: <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) > > in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, this method is recommended; > > 2. If there is no CI environment, please provide the test results of this driver in the master branch or in the > > designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation > > of the commit. > > Providing test result can be our option. The test result can be part of the driver documentation[0] as this is public > to users. > And from my understanding, the test result should work as the role of tempest case and clarify at least: necessary > configuration,test operations and test results. i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] https://docs.openstack.org/cyborg/latest/reference/support-matrix.html#driver-support > > > > [1] http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From ionut at fleio.com Fri Jul 10 13:50:32 2020 From: ionut at fleio.com (Ionut Biru) Date: Fri, 10 Jul 2020 16:50:32 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi again, I did not manage to make it work, I cannot figure out how to connect all the pieces. pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ pipeline.yaml https://paste.xinu.at/u1E42/ polling.yaml https://paste.xinu.at/MZWNs/ gnocchi_resources.yaml https://paste.xinu.at/j3AX/ gnocchi_client.py in resources_update_operations https://paste.xinu.at/no5/ gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ Do you mind if you do a full example using "dynamic.network.services.vpn.connection" from https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html ? Or maybe you can point me to the mistakes made in my configuration? On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > That is the right direction. I don't know why people hard-coded the > initial pollsters' configs and did not document the relation between > Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a > single system, but interdependent systems to implement a monitoring > solution. Ceilometer is the component that gathers data/information, > processes, and then persists it somewhere. Gnocchi is one of the options > that Ceilometer can use to persist data. By default, Ceilometer creates > some basic configurations in Gnocchi to store data, such as some default > resource-types with default attributes. However, we do not need (should > not) rely on this default config. > > You can create and use custom resources to fit the stack to your needs. > This can be achieved via `gnocchi resource-type create -a > :: ` and > `gnocchi resource-type create -u > :: `. > Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), > you can customize the mapping of metrics to resource-types in Gnocchi. > > On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: > >> Hello again, >> >> What's the proper way to handle dynamic pollsters in gnocchi ? >> Right now ceilometer returns: >> >> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >> is not handled by Gnocchi >> >> I found >> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >> but I'm not sure if is the right direction. >> >> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >> >>> Seems to work fine now. Thanks. >>> >>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>> rafaelweingartner at gmail.com> wrote: >>> >>>> It looks like a coding error that we left behind during a major >>>> refactoring that we introduced upstream. >>>> I created a patch for it. Can you check/review and test it? >>>> https://review.opendev.org/739555 >>>> >>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>> >>>>> Hi Rafael, >>>>> >>>>> I have an error and I cannot resolve it myself. >>>>> >>>>> https://paste.xinu.at/LEfdXD/ >>>>> >>>>> Do you happen to know what's wrong? >>>>> >>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>> >>>>> >>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> Good catch. I fixed the docs. >>>>>> https://review.opendev.org/#/c/739288/ >>>>>> >>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I just noticed that the example >>>>>>> dynamic.network.services.vpn.connection from >>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>> the wrong indentation. >>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>> >>>>>>> Now I have to see why is not polling from it >>>>>>> >>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hi Rafael, >>>>>>>> >>>>>>>> I think I applied all the reviews successfully but I tried to do an >>>>>>>> octavia dynamic poller but I have couples of errors. >>>>>>>> >>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>> Error is about syntax error near name: >>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>> >>>>>>>> if i remove the - in front of name like this: >>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>> >>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>> and those will be available for victoria? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>>> >>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>> production? >>>>>>>>>> >>>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>>> have a few openstack components that are customized with the >>>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>>> not using the community version, but something in-between (the community >>>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>>> quality for production. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru wrote: >>>>>>>>> >>>>>>>>>> Hello Rafael, >>>>>>>>>> >>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>> and those will be available for victoria? >>>>>>>>>> >>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>> production? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>> that might be important for your use case: >>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>>> returns the full load balancer status tree [1]. >>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for statistics >>>>>>>>>>>>>> [2][3][4]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>> [2] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Rafael Weingärtner >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Rafael Weingärtner >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dev.faz at gmail.com Fri Jul 10 14:01:26 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Fri, 10 Jul 2020 16:01:26 +0200 Subject: [octavia] Replace broken amphoras Message-ID: Hi, we had some network issues and now have amphoras which are marked in ERROR state. What we already tried: - failover the amphora - failover the loadbalancer both did not work, got "unable to attach port to (new) amphora". Then we removed the vrrp_port, set the vrrp_port_id to NULL and repeated the amphora failover Reverting Err: "PortID: Null" Then we created a new vrrp_port as described [1] and added the port-id to the vrrp_port_id and the a suitable vrrp_ip field to our ERRORed amphora entry. Restarted failover -> without luck. Currently we have an single STANDALONE amphora configured. Is there a way to trigger octavia to create new "clean" amphoras for MASTER/BACKUP? Thanks, Fabian [1] http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Fri Jul 10 14:03:10 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 10 Jul 2020 09:03:10 -0500 Subject: [release] Release countdown for week R-13 July 13 - July 17 Message-ID: <20200710140310.GA2336490@sm-workstation> Development Focus ----------------- The Victoria-2 milestone will happen in a few weeks, on July 30. Victoria-related specs should now be finalized so that teams can move to implementation ASAP. Some teams observe specific deadlines on the second milestone (mostly spec freezes): please refer to https://releases.openstack.org/victoria/schedule.html for details. General Information ------------------- Please remember that libraries need to be released at least once per milestone period. At milestone 2, the release team will propose releases for any library that has not been otherwise released since milestone 1. Other non-library deliverables that follow the cycle-with-intermediary release model should have an intermediary release before milestone-2. Those who haven't will be proposed to switch to the cycle-with-rc model, which is more suited to deliverables that are released only once per cycle. At milestone-2 we also freeze the contents of the final release. If you have a new deliverable that should be included in the final release, you should make sure it has a deliverable file in: https://opendev.org/openstack/releases/src/branch/master/deliverables/victoria You should request a beta release (or intermediary release) for those new deliverables by milestone-2. We understand some may not be quite ready for a full release yet, but if you have something minimally viable to get released it would be good to do a 0.x release to exercise the release tooling for your deliverables. See the MembershipFreeze description for more details: https://releases.openstack.org/victoria/schedule.html#v-mf Finally, now may be a good time for teams to check on any stable releases that need to be done for your deliverables. If you have bugfixes that have been backported, but no stable release getting those. If you are unsure what is out there committed but not released, in the openstack/releases repo, running the command "tools/list_stable_unreleased_changes.sh " gives a nice report. Upcoming Deadlines & Dates -------------------------- Victoria-2 milestone: July 30 From rosmaita.fossdev at gmail.com Fri Jul 10 16:11:03 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 10 Jul 2020 12:11:03 -0400 Subject: [cinder] monthly video meeting poll results Message-ID: tl;dr - our first video meeting will be Wednesday 29 July connection info will be on the agenda etherpad, https://etherpad.opendev.org/p/cinder-victoria-meetings For those who didn't see the poll, this is what it was about: We're considering holding the Cinder weekly meeting as a video conference once each month. It will be the last meeting of each month and will take place at the regularly scheduled meeting time (1400 UTC for 60 minutes). Video Meeting Rules: * Everyone will keep IRC open during the meeting. * We'll take notes in IRC to leave a record similar to what we have for our regular IRC meetings. * Some people are more comfortable communicating in written English. So at any point, any attendee may request that the discussion of the current topic be conducted entirely in IRC. The results: Do it? - 50% in favor, 33% in strong favor, 17% don't care, no one opposed. Record? - 50% yes, 50% don't care Conferencing software? - Bluejeans: first choice of 70% of respondents Comments - Let's work hard to write what we speak! - people who don't want to be recorded can turn their camera off - video conference plus IRC is for sure better than IRC only - Zoom is shady and possibly not appropriate for an open source project that wants to welcome contributors from all countries. I think we're better off avoiding it. Conclusion: We'll hold the Cinder weekly meeting for 29 July in BlueJeans *and* IRC following the ground rules laid out above, and continue doing the same for the last meeting of each month through the end of the Victoria cycle. The meetings will be recorded. From pramchan at yahoo.com Fri Jul 10 16:55:08 2020 From: pramchan at yahoo.com (prakash RAMCHANDRAN) Date: Fri, 10 Jul 2020 16:55:08 +0000 (UTC) Subject: [all][InteropWG] weekly Friday call - request for Partciapation References: <1093529208.5519657.1594400108326.ref@mail.yahoo.com> Message-ID: <1093529208.5519657.1594400108326@mail.yahoo.com> Hi all, We have the agenda listed on https://etherpad.opendev.org/p/interop Call is now moved to meetpad and will be easy to acess and try it out today in next 10-15 minutes if you find time and interest. OpenStack  / Open Infra -interopWGInterop Working Group - Weekly Friday 10-11 AM or UTC 17-18 Lnink: https://meetpad.opendev.org/Interop-WG-weekly-meeting ThanksPrakash -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Fri Jul 10 17:29:41 2020 From: mark at stackhpc.com (Mark Goddard) Date: Fri, 10 Jul 2020 18:29:41 +0100 Subject: [kolla] PTL holiday Message-ID: Hi, I will be on holiday next Tuesday (14th) to Thursday (16th). I will therefore miss both the IRC meeting and Kolla Klub. If someone is able to chair the IRC meeting, please reply here. There is currently nothing on the agenda for the Klub, so anyone looking to chair that meeting will also need to find some topics to cover. We have suggestions in [1]. Cheers, Mark [1] https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw From anilj.mailing at gmail.com Fri Jul 10 17:44:35 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 10 Jul 2020 10:44:35 -0700 Subject: OpenStack cluster event notification In-Reply-To: <59G5DQ.5J8F1FJXF7IT3@est.tech> References: <59G5DQ.5J8F1FJXF7IT3@est.tech> Message-ID: Hello Gibi. I looked at your sample code. Other than providing the username password of the user in transport url, *transport = oslo_messaging.get_notification_transport(cfg.CONF, url='rabbit://stackrabbit:admin at 100.109.0.10:5672/ ')* What changes are to be done in the nova.conf file? Can you please provide the exact set of changes? /anil. On Wed, Jul 8, 2020 at 5:05 AM Balázs Gibizer wrote: > > > On Tue, Jul 7, 2020 at 16:32, Julia Kreger > wrote: > [snip] > > > > > Although that being said, I don't think much would really prevent you > > from consuming the notifications directly from the message bus, if you > > so desire. Maybe someone already has some code for this on hand. > > Here is some example code that forwards the nova versioned > notifications from the message bus out to a client via websocket [1]. I > used this sample code in my demo [2] during a summit presentation. > > Cheers, > gibi > > [1] > > https://github.com/gibizer/nova-notification-demo/blob/master/ws_forwarder.py > [2] https://www.youtube.com/watch?v=WFq5JWXa9AM > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Fri Jul 10 17:24:17 2020 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Fri, 10 Jul 2020 14:24:17 -0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Sure, this is a minimalistic config I used for testing (watch for the indentation issues that might happen due to copy/paste into Gmail). > cat ceilometer/pollsters.d/vpn-connection-dynamic-pollster.yaml > --- > > - name: "dynamic_pollster.network.services.vpn.connection" > sample_type: "gauge" > unit: "ipsec_site_connection" > value_attribute: "status" > endpoint_type: "network" > url_path: "v2.0/vpn/ipsec-site-connections" > metadata_fields: > - "name" > - "vpnservice_id" > - "description" > - "status" > - "peer_address" > value_mapping: > ACTIVE: "1" > DOWN: "0" > metadata_mapping: > name: "display_name" > default_value: 0 > Then, the polling.yaml file cat ceilometer/polling.yaml | grep -A 3 vpnass > - name: vpnass_pollsters > interval: 600 > meters: > - dynamic_pollster.network.services.vpn.connection > And last, but not least, the custom_gnocchi_resources file. > cat ceilometer/custom_gnocchi_resources.yaml | grep -B 2 -A 9 > "dynamic_pollster.network.services.vpn.connection" > - resource_type: s2svpn > metrics: > dynamic_pollster.network.services.vpn.connection: > attributes: > name: resource_metadata.name > vpnservice_id: resource_metadata.vpnservice_id > description: resource_metadata.description > status: resource_metadata.status > peer_address: resource_metadata.peer_address > display_name: resource_metadata.display_name > Bear in mind that you need to create the Gnocchi resource type. > gnocchi resource-type show s2svpn > > +--------------------------+-----------------------------------------------------------+ > | Field | Value > | > > +--------------------------+-----------------------------------------------------------+ > | attributes/description | max_length=255, min_length=0, required=False, > type=string | > | attributes/display_name | max_length=255, min_length=0, required=False, > type=string | > | attributes/name | max_length=255, min_length=0, required=False, > type=string | > | attributes/peer_address | max_length=255, min_length=0, required=False, > type=string | > | attributes/status | max_length=255, min_length=0, required=False, > type=string | > | attributes/vpnservice_id | required=False, type=uuid > | > | name | s2svpn > | > | state | active > | > > +--------------------------+-----------------------------------------------------------+ > What is the problem you are having? On Fri, Jul 10, 2020 at 10:50 AM Ionut Biru wrote: > Hi again, > > I did not manage to make it work, I cannot figure out how to connect all > the pieces. > > pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ > pipeline.yaml https://paste.xinu.at/u1E42/ > polling.yaml https://paste.xinu.at/MZWNs/ > gnocchi_resources.yaml https://paste.xinu.at/j3AX/ > gnocchi_client.py in resources_update_operations > https://paste.xinu.at/no5/ > gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ > Do you mind if you do a full example > using "dynamic.network.services.vpn.connection" from > https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html > ? > > Or maybe you can point me to the mistakes made in my configuration? > > > On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> That is the right direction. I don't know why people hard-coded the >> initial pollsters' configs and did not document the relation between >> Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a >> single system, but interdependent systems to implement a monitoring >> solution. Ceilometer is the component that gathers data/information, >> processes, and then persists it somewhere. Gnocchi is one of the options >> that Ceilometer can use to persist data. By default, Ceilometer creates >> some basic configurations in Gnocchi to store data, such as some default >> resource-types with default attributes. However, we do not need (should >> not) rely on this default config. >> >> You can create and use custom resources to fit the stack to your needs. >> This can be achieved via `gnocchi resource-type create -a >> :: ` and >> `gnocchi resource-type create -u >> :: `. >> Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), >> you can customize the mapping of metrics to resource-types in Gnocchi. >> >> On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: >> >>> Hello again, >>> >>> What's the proper way to handle dynamic pollsters in gnocchi ? >>> Right now ceilometer returns: >>> >>> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >>> is not handled by Gnocchi >>> >>> I found >>> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >>> but I'm not sure if is the right direction. >>> >>> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >>> >>>> Seems to work fine now. Thanks. >>>> >>>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>>> rafaelweingartner at gmail.com> wrote: >>>> >>>>> It looks like a coding error that we left behind during a major >>>>> refactoring that we introduced upstream. >>>>> I created a patch for it. Can you check/review and test it? >>>>> https://review.opendev.org/739555 >>>>> >>>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>>> >>>>>> Hi Rafael, >>>>>> >>>>>> I have an error and I cannot resolve it myself. >>>>>> >>>>>> https://paste.xinu.at/LEfdXD/ >>>>>> >>>>>> Do you happen to know what's wrong? >>>>>> >>>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>>> >>>>>> >>>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>>> rafaelweingartner at gmail.com> wrote: >>>>>> >>>>>>> Good catch. I fixed the docs. >>>>>>> https://review.opendev.org/#/c/739288/ >>>>>>> >>>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I just noticed that the example >>>>>>>> dynamic.network.services.vpn.connection from >>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>>> the wrong indentation. >>>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>>> >>>>>>>> Now I have to see why is not polling from it >>>>>>>> >>>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hi Rafael, >>>>>>>>> >>>>>>>>> I think I applied all the reviews successfully but I tried to do >>>>>>>>> an octavia dynamic poller but I have couples of errors. >>>>>>>>> >>>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>>> Error is about syntax error near name: >>>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>>> >>>>>>>>> if i remove the - in front of name like this: >>>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>>> >>>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would say so. We are lacking people to review and then merge it. >>>>>>>>>> >>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>> production? >>>>>>>>>>> >>>>>>>>>> As long as the person executing the cherry-picks, and maintaining >>>>>>>>>> the code knows what she/he is doing, you should be safe. The guys that are >>>>>>>>>> using this implementation (and others that I and my colleagues proposed), >>>>>>>>>> have a few openstack components that are customized with the >>>>>>>>>> patches/enhancements/extensions we developed so far; this means, they are >>>>>>>>>> not using the community version, but something in-between (the community >>>>>>>>>> releases + the patches we did). Of course, it is only possible, because we >>>>>>>>>> are the ones creating and maintaining these codes; therefore, we can assure >>>>>>>>>> quality for production. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello Rafael, >>>>>>>>>>> >>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>> >>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>> production? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>>> that might be important for your use case: >>>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll defer >>>>>>>>>>>>> to the Ceilometer project. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer was >>>>>>>>>>>>>>>> deployed and has status active. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You can get the provisioning and operating status of Octavia >>>>>>>>>>>>>>> load balancers via the Octavia API. There is also an API endpoint that >>>>>>>>>>>>>>> returns the full load balancer status tree [1]. >>>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for >>>>>>>>>>>>>>> statistics [2][3][4]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Rafael Weingärtner >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ionut Biru - https://fleio.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Rafael Weingärtner >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ionut Biru - https://fleio.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Rafael Weingärtner >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Ionut Biru - https://fleio.com >>> >> >> >> -- >> Rafael Weingärtner >> > > > -- > Ionut Biru - https://fleio.com > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Jul 10 19:55:19 2020 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 10 Jul 2020 21:55:19 +0200 Subject: [blazar] IRC meetings cancelled next week Message-ID: Hello, As I will be on holiday next week (July 13-17), I have proposed that both IRC meetings are cancelled. We will meet again on Tuesday July 21 for EMEA and Thursday July 30 for Americas. Cheers, Pierre (priteau) From paye600 at gmail.com Fri Jul 10 20:23:45 2020 From: paye600 at gmail.com (Roman Gorshunov) Date: Fri, 10 Jul 2020 22:23:45 +0200 Subject: [loci][helm][k8s] When do images on docker.io get updated In-Reply-To: <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> References: <7261df59-de91-345f-02e7-19885404d5d2@dantalion.nl> <61872A8F-5495-4C6E-AD86-14A61F9431A1@gmail.com> Message-ID: Hello Corne, The loci images are now updated [0]. Thanks to Andrii Ostapenko and reviewers. [0] https://hub.docker.com/u/loci Best regards, Roman Gorshunov On Thu, Jul 2, 2020 at 12:35 PM Roman Gorshunov wrote: > > Hello Corne, > > Thank you for your email. i have investigated the issue, and seems that we have image push broken for some time. > While we work on resolution, I could advice you to locally build images, if that suits you. > > I would post a reply here to the mailing list once issue is resolved. > Again, thank you for paying attention and informing us. > > Best regards, > Roman Gorshunov > From zhangbailin at inspur.com Sat Jul 11 01:41:41 2020 From: zhangbailin at inspur.com (=?utf-8?B?QnJpbiBaaGFuZyjlvKDnmb7mnpcp?=) Date: Sat, 11 Jul 2020 01:41:41 +0000 Subject: =?utf-8?B?562U5aSNOiBbY3lib3JnXSBUZW1wb3JhcnkgdHJlYXRtZW50IHBsYW4gZm9y?= =?utf-8?Q?_the_3rd-party_driver?= In-Reply-To: <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> References: <94B50EE3-F888-4BFA-908C-10B416096A64.ref@yahoo.com> <94B50EE3-F888-4BFA-908C-10B416096A64@yahoo.com> <91e7b70d6dea95fce428511010bfa8e0cf2ce4e4.camel@redhat.com> Message-ID: On Fri, 2020-07-10 at 13:37 +0800, yumeng bao wrote: > Brin, thanks for bringing this up! > > > Hi all: > > This release we want to introduce some 3rd party drivers > > (e.g. Intel QAT, Inspur FPGA, and Inspur SSD etc.) in Cyborg, and we discussed the handling of 3rd-party driver CI in Cyborg IRC meeting [1]. > > Due to the lack of CI test environment supported by hardware, > > we reached a temporary solution in two ways, as > > follows: > > 1. Provide a CI environment and provide a tempest test for Cyborg, > > this method is recommended; 2. If there is no CI environment, please > > provide the test results of this driver in the master branch or in > > the designated branch, which should be as complete as possible, sent to the Cyborg team, or pasted in the implementation of the commit. > > Providing test result can be our option. The test result can be part > of the driver documentation[0] as this is public to users. > And from my understanding, the test result should work as the role of > tempest case and clarify at least: necessary configuration,test operations and test results. > i would advise against including the resulsts in docuemntation add int test results to a commit or provideing tiem at the poitn it merged just tells you it once worked on the developers system likely using devstack to deploy. it does not tell you that it still work after even a singel addtional commit has been merged. so i would sugges not adding the results to the docs as they will get out dateded quickly. Good advice, this is also my original intention. Give the result verification in the submitted commit, and do not put the test verification result in the code base. As you said, this does not mean that it will always work unless a test report can be provided regularly. Of course, it is better if there is a third-party CI , we will try our best to fight for it. > maintaining a wiki is fine but i woudl suggest considring any driver that does not have first or thirdparty ci to be experimental. the generic mdev driver we talked about can be tested using sampel kernel modules that provide realy mdevs implemnetaion of srial consoles or graphics devices. so it could be validated in first party ci and consider supported/non experimaental. if other driver can similarly be tested with virtual hardware or sample kernel modules that allowed testing in the first party ci they could alos be marked as fully supported. with out that level of testing however i would not advertise a driver as anything more then experimental. > the old rule when i started working on openstack was if its not tested in ci its broken. > > [0] > https://docs.openstack.org/cyborg/latest/reference/support-matrix.html > #driver-support > > > > [1] > > http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openst > > ack_cyborg.2020-07-02-03.05.log.html > > Regards, > Yumeng > From radoslaw.piliszek at gmail.com Sat Jul 11 10:40:10 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 11 Jul 2020 12:40:10 +0200 Subject: [kolla] PTL holiday In-Reply-To: References: Message-ID: On Fri, Jul 10, 2020 at 7:39 PM Mark Goddard wrote: > > Hi, > > I will be on holiday next Tuesday (14th) to Thursday (16th). I will > therefore miss both the IRC meeting and Kolla Klub. If someone is able > to chair the IRC meeting, please reply here. There is currently > nothing on the agenda for the Klub, so anyone looking to chair that > meeting will also need to find some topics to cover. We have > suggestions in [1]. I agree to chair them both. For Klub I suggest we run an open discussion panel. There is usually something to talk about but the formal agenda might sound scary. :-) -yoctozepto From reza.b2008 at gmail.com Sat Jul 11 12:55:36 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Sat, 11 Jul 2020 17:25:36 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I found following error in ironic and container-puppet-ironic container log during installation: puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: > Hi, > > I'm going to install OpenStack Train with the help of TripleO on CentOS 8, > but undercloud installation fails with the following error: > > "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", > "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ > exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- > Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 > -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: > 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 > 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 > 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} > > Any suggestion would be grateful. > Regards, > Reza > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Sat Jul 11 16:44:48 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Sat, 11 Jul 2020 11:44:48 -0500 Subject: [cinder] monthly video meeting poll results In-Reply-To: References: Message-ID: Brian, Thanks for putting this together and for the summary. Look forward to seeing you all later this month.  :-) Jay On 7/10/2020 11:11 AM, Brian Rosmaita wrote: > tl;dr - our first video meeting will be Wednesday 29 July >   connection info will be on the agenda etherpad, >   https://etherpad.opendev.org/p/cinder-victoria-meetings > > For those who didn't see the poll, this is what it was about: > > We're considering holding the Cinder weekly meeting as a video > conference once each month. It will be the last meeting of each month > and will take place at the regularly scheduled meeting time (1400 UTC > for 60 minutes). > > Video Meeting Rules: > * Everyone will keep IRC open during the meeting. > * We'll take notes in IRC to leave a record similar to what we have > for our regular IRC meetings. > * Some people are more comfortable communicating in written English. > So at any point, any attendee may request that the discussion of the > current topic be conducted entirely in IRC. > > The results: > Do it? > - 50% in favor, 33% in strong favor, 17% don't care, no one opposed. > Record? > - 50% yes, 50% don't care > Conferencing software? > - Bluejeans: first choice of 70% of respondents > Comments > - Let's work hard to write what we speak! > - people who don't want to be recorded can turn their camera off > - video conference plus IRC is for sure better than IRC only > - Zoom is shady and possibly not appropriate for an open source > project that wants to welcome contributors from all countries. I think > we're better off avoiding it. > > Conclusion: > We'll hold the Cinder weekly meeting for 29 July in BlueJeans *and* > IRC following the ground rules laid out above, and continue doing the > same for the last meeting of each month through the end of the > Victoria cycle.  The meetings will be recorded. > From aschultz at redhat.com Sun Jul 12 21:09:45 2020 From: aschultz at redhat.com (Alex Schultz) Date: Sun, 12 Jul 2020 15:09:45 -0600 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I don't believe centos8 containers are available for Train yet. The error you're hitting is because it's fetching centos7 containers and the ironic container is not backwards compatible between the two versions. If you want centos8, use Ussuri. On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi wrote: > > I found following error in ironic and container-puppet-ironic container log during installation: > > puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 > > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: >> >> Hi, >> >> I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: >> >> "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} >> >> Any suggestion would be grateful. >> Regards, >> Reza >> >> From marios at redhat.com Mon Jul 13 06:20:14 2020 From: marios at redhat.com (Marios Andreou) Date: Mon, 13 Jul 2020 09:20:14 +0300 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: Hi folks, On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: > I don't believe centos8 containers are available for Train yet. The > error you're hitting is because it's fetching centos7 containers and > the ironic container is not backwards compatible between the two > versions. If you want centos8, use Ussuri. > > fyi we started pushing centos8 train last week - slightly different namespace - latest current-tripleo containers are pushed to https://hub.docker.com/u/tripleotraincentos8 hope it helps > On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi > wrote: > > > > I found following error in ironic and container-puppet-ironic container > log during installation: > > > > puppet-user: Error: > /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: > Could not evaluate: Could not retrieve information from environment > production source(s) file:/tftpboot/ldlinux.c32 > > > > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi > wrote: > >> > >> Hi, > >> > >> I'm going to install OpenStack Train with the help of TripleO on CentOS > 8, but undercloud installation fails with the following error: > >> > >> "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ > '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit > 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying > running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed > running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- > Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 > ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: > 95117 -- ERROR configuring zaqar"]} > >> > >> Any suggestion would be grateful. > >> Regards, > >> Reza > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyppe at gmail.com Mon Jul 13 07:43:51 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Mon, 13 Jul 2020 15:43:51 +0800 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Message-ID: Hi Bharat, many thanks for your super quick response to me last week. I really appreciate that, especially since I had been trying for so long on this issue here. I wanted to try out your suggestion before coming back and creating a reply. I tried your suggestion and at first, I got the same experience (failure) when creating a cluster. It appeared to stop in the same place as I described in the mail previous. I noticed some weird things with DNS integration (Designate) during the investigation [1] and [2]. I decided to remove Designate from Openstack and retest and now I am successfully able to deploy a kubernetes cluster! :) Regarding those 2 points: [1] - the configured designate zone was project.cloud.company.com and instance1 would be instance1.project.cloud.company.com however, the kube master instance hostname was getting master.cloud.company.com [2] - when doing a dns lookup on master.project.cloud.company.com the private IP was being returned instead of the floating IP. This meant that from outside the project, the instance couldnt be pinged by hostname. I've removed both magnum and Designate and then redeployed both by first deploying Magnum and testing successful kubernetes cluster deployment using your fix Bharat. Then I deployed Designate again. Issue [1] is still present while issue [2] is resolved and no longer present. Kubernetes cluster deployment is still successful :) Thank you once again and have a great week ahead! Kind regards, Tony Pearce On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar wrote: > Hi Tony > > That is a known issue and is due to the default version of heat container > agent baked into Train release. Please use label > heat_container_agent_tag=train-stable-3 and you should be good to go. > > Cheers > > Bharat > > On 10 Jul 2020, at 09:18, Tony Pearce wrote: > > Hi team, I hope you are all keeping safe and well at the moment. > > I am trying to use magnum to launch a kubernetes cluster. I have tried > different images but currently using Fedora-Atomic 27. The cluster > deployment from the cluster template is failing and I am here to ask if you > could please point me in the right direction? I have become stuck and I am > uncertain how to further troubleshoot this. The cluster seems to fail a few > minutes after booting up the master node because after I see the logs > ([1],[2]), I do not see any progress in terms of new (different) logs or > load on the master. Then the 60-minute timeout is reached and fails the > cluster. > > I deployed this openstack stack using kayobe (kolla-ansible) and this is > version Train. This is deployed on CentOS 7 within docker containers. > Kayobe manages this deployment through the ansible playbooks. > > This was previously working some months back although I think I may have > used coreos image at that time, and that is also not working today. The > deployment would have been back around February 2020. I then deleted that > deployment and re-deployed. The only change being the hostname for > controller node as updated in the inventory file for the kayobe. > Since then which was a month or so back I've been unable to successfully > deploy a kubernetes cluster. I've tried other fedora-atomic images as well > as coreos without success. When using the coreos image and when tagging the > image with the coreos tag as per the magnum docs, the instance fails to > boot and goes to the rescue shell. However if I manually launch the coreos > image then it does successfully boot and get configured via cloud-init. All > of the deployment attempts stop at the same place when using fedora image > and I have a different experience if I disable TLS: > > TLS enabled: master launched, no nodes. Fails when > running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml > > TLS disabled: master and nodes launched but later fails. I > didnt investigate this very much. > > When looking for help around the web, I found this which looks to be the > same issue that I have at the moment (although he's deployed slightly > differently, using centos8 and mentions magnum 10): > > https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ > > > I have the same log messages on the master node within heat. > > When going through the troubleshooting guide I see that etcd is running > and no errors however I dont see any flannel service at all. But I also > don't know if this has simply failed before getting to deploy flannel or > whether flannel is the reason. I did try to deploy using a cluster template > that is using calico as a test but the same result from the logs. > > When looking at the stack via cli to see the failed stacks this is what I > see there: http://paste.openstack.org/show/795736/ > > I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and > 2GB memory. > Storage is only via cinder as I am using iscsi storage with a cinder > driver. I dont have any other storage. > > On the master, after the failure the heat log repeats these logs: > > ++ curl --silent http://127.0.0.1:8080/healthz > + '[' ok = ok ']' > + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch > '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' > error: no configuration has been provided, try setting KUBERNETES_MASTER > environment variable > Trying to label master node with node-role.kubernetes.io/master="" > + echo 'Trying to label master node with node-role.kubernetes.io/master= > ""' > + sleep 5s > > [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ > [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ > > May I ask if anyone has a recent deployment of Magnum and a working > deployment of kubernetes that could share with me the relevant details like > the image you have used so that I can try and replicate? > > To create the cluster template I have been using: > openstack coe cluster template create k8s-cluster-template \ > --image Fedora-Atomic-27 \ > --keypair testpair \ > --external-network physnet2vlan20 \ > --dns-nameserver 192.168.7.233 \ > --flavor 2GB-2vCPU \ > --docker-volume-size 15 \ > --network-driver flannel \ > --coe kubernetes > > > If I have missed anything, I am happy to provide it. > > Many thanks in advance for any help or pointers on this. > > Regards, > > Tony Pearce > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Mon Jul 13 08:35:39 2020 From: bharat at stackhpc.com (Bharat Kunwar) Date: Mon, 13 Jul 2020 09:35:39 +0100 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> Message-ID: <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> Hi Tony I have not used designate myself so not sure about the exact details but if you are using Kayobe/Kolla-Ansible, we recently proposed these backports to train, https://review.opendev.org/#/c/738882/1/ansible/roles/magnum/templates/magnum.conf.j2 . Magnum queries Keystone catalog for the url instances can use to talk back with Keystone and Magnum itself. Usually this is the public URL but essentially you need to specify an endpoint name which fits the bill. Please check /etc/kolla/magnum-conductor/magnum.conf in your control plane where Magnum is deployed and ensure it it configured to the correct interface. Cheers Bharat > On 13 Jul 2020, at 08:43, Tony Pearce wrote: > > Hi Bharat, many thanks for your super quick response to me last week. I really appreciate that, especially since I had been trying for so long on this issue here. I wanted to try out your suggestion before coming back and creating a reply. > > I tried your suggestion and at first, I got the same experience (failure) when creating a cluster. It appeared to stop in the same place as I described in the mail previous. I noticed some weird things with DNS integration (Designate) during the investigation [1] and [2]. I decided to remove Designate from Openstack and retest and now I am successfully able to deploy a kubernetes cluster! :) > > Regarding those 2 points: > [1] - the configured designate zone was project.cloud.company.com and instance1 would be instance1.project.cloud.company.com however, the kube master instance hostname was getting master.cloud.company.com > [2] - when doing a dns lookup on master.project.cloud.company.com the private IP was being returned instead of the floating IP. This meant that from outside the project, the instance couldnt be pinged by hostname. > > I've removed both magnum and Designate and then redeployed both by first deploying Magnum and testing successful kubernetes cluster deployment using your fix Bharat. Then I deployed Designate again. Issue [1] is still present while issue [2] is resolved and no longer present. Kubernetes cluster deployment is still successful :) > > Thank you once again and have a great week ahead! > > Kind regards, > > Tony Pearce > > > > On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar > wrote: > Hi Tony > > That is a known issue and is due to the default version of heat container agent baked into Train release. Please use label heat_container_agent_tag=train-stable-3 and you should be good to go. > > Cheers > > Bharat > >> On 10 Jul 2020, at 09:18, Tony Pearce > wrote: >> >> Hi team, I hope you are all keeping safe and well at the moment. >> >> I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. >> >> I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks. >> >> This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe. >> Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: >> >> TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml >> >> TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. >> >> When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10): >> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ >> >> I have the same log messages on the master node within heat. >> >> When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs. >> >> When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ >> >> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. >> Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. >> >> On the master, after the failure the heat log repeats these logs: >> >> ++ curl --silent http://127.0.0.1:8080/healthz >> + '[' ok = ok ']' >> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master ": ""}}}' >> error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable >> Trying to label master node with node-role.kubernetes.io/master= "" >> + echo 'Trying to label master node with node-role.kubernetes.io/master= ""' >> + sleep 5s >> >> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ >> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ >> >> May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? >> >> To create the cluster template I have been using: >> openstack coe cluster template create k8s-cluster-template \ >> --image Fedora-Atomic-27 \ >> --keypair testpair \ >> --external-network physnet2vlan20 \ >> --dns-nameserver 192.168.7.233 \ >> --flavor 2GB-2vCPU \ >> --docker-volume-size 15 \ >> --network-driver flannel \ >> --coe kubernetes >> >> >> If I have missed anything, I am happy to provide it. >> >> Many thanks in advance for any help or pointers on this. >> >> Regards, >> >> Tony Pearce >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonyppe at gmail.com Mon Jul 13 08:41:44 2020 From: tonyppe at gmail.com (Tony Pearce) Date: Mon, 13 Jul 2020 16:41:44 +0800 Subject: [magnum] failed to launch Kubernetes cluster In-Reply-To: <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> References: <59A5430D-6712-4204-867C-EF8E72C18845@stackhpc.com> <89DC9036-27A3-48E6-9AD6-03B6577C9CB5@stackhpc.com> Message-ID: Hi Bharat, Thank you again :) Tony Pearce On Mon, 13 Jul 2020 at 16:35, Bharat Kunwar wrote: > Hi Tony > > I have not used designate myself so not sure about the exact details but > if you are using Kayobe/Kolla-Ansible, we recently proposed these backports > to train, > https://review.opendev.org/#/c/738882/1/ansible/roles/magnum/templates/magnum.conf.j2. > Magnum queries Keystone catalog for the url instances can use to talk back > with Keystone and Magnum itself. Usually this is the public URL but > essentially you need to specify an endpoint name which fits the bill. > Please check /etc/kolla/magnum-conductor/magnum.conf in your control plane > where Magnum is deployed and ensure it it configured to the correct > interface. > > > Cheers > > Bharat > > On 13 Jul 2020, at 08:43, Tony Pearce wrote: > > Hi Bharat, many thanks for your super quick response to me last week. I > really appreciate that, especially since I had been trying for so long on > this issue here. I wanted to try out your suggestion before coming back and > creating a reply. > > I tried your suggestion and at first, I got the same experience (failure) > when creating a cluster. It appeared to stop in the same place as I > described in the mail previous. I noticed some weird things with DNS > integration (Designate) during the investigation [1] and [2]. I decided to > remove Designate from Openstack and retest and now I am successfully able > to deploy a kubernetes cluster! :) > > Regarding those 2 points: > [1] - the configured designate zone was project.cloud.company.com and > instance1 would be instance1.project.cloud.company.com however, the kube > master instance hostname was getting master.cloud.company.com > [2] - when doing a dns lookup on master.project.cloud.company.com the > private IP was being returned instead of the floating IP. This meant that > from outside the project, the instance couldnt be pinged by hostname. > > I've removed both magnum and Designate and then redeployed both by first > deploying Magnum and testing successful kubernetes cluster deployment using > your fix Bharat. Then I deployed Designate again. Issue [1] is still > present while issue [2] is resolved and no longer present. Kubernetes > cluster deployment is still successful :) > > Thank you once again and have a great week ahead! > > Kind regards, > > Tony Pearce > > > > On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar wrote: > >> Hi Tony >> >> That is a known issue and is due to the default version of heat container >> agent baked into Train release. Please use label >> heat_container_agent_tag=train-stable-3 and you should be good to go. >> >> Cheers >> >> Bharat >> >> On 10 Jul 2020, at 09:18, Tony Pearce wrote: >> >> Hi team, I hope you are all keeping safe and well at the moment. >> >> I am trying to use magnum to launch a kubernetes cluster. I have tried >> different images but currently using Fedora-Atomic 27. The cluster >> deployment from the cluster template is failing and I am here to ask if you >> could please point me in the right direction? I have become stuck and I am >> uncertain how to further troubleshoot this. The cluster seems to fail a few >> minutes after booting up the master node because after I see the logs >> ([1],[2]), I do not see any progress in terms of new (different) logs or >> load on the master. Then the 60-minute timeout is reached and fails the >> cluster. >> >> I deployed this openstack stack using kayobe (kolla-ansible) and this is >> version Train. This is deployed on CentOS 7 within docker containers. >> Kayobe manages this deployment through the ansible playbooks. >> >> This was previously working some months back although I think I may have >> used coreos image at that time, and that is also not working today. The >> deployment would have been back around February 2020. I then deleted that >> deployment and re-deployed. The only change being the hostname for >> controller node as updated in the inventory file for the kayobe. >> Since then which was a month or so back I've been unable to successfully >> deploy a kubernetes cluster. I've tried other fedora-atomic images as well >> as coreos without success. When using the coreos image and when tagging the >> image with the coreos tag as per the magnum docs, the instance fails to >> boot and goes to the rescue shell. However if I manually launch the coreos >> image then it does successfully boot and get configured via cloud-init. All >> of the deployment attempts stop at the same place when using fedora image >> and I have a different experience if I disable TLS: >> >> TLS enabled: master launched, no nodes. Fails when >> running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml >> >> TLS disabled: master and nodes launched but later fails. I >> didnt investigate this very much. >> >> When looking for help around the web, I found this which looks to be the >> same issue that I have at the moment (although he's deployed slightly >> differently, using centos8 and mentions magnum 10): >> >> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ >> >> >> I have the same log messages on the master node within heat. >> >> When going through the troubleshooting guide I see that etcd is running >> and no errors however I dont see any flannel service at all. But I also >> don't know if this has simply failed before getting to deploy flannel or >> whether flannel is the reason. I did try to deploy using a cluster template >> that is using calico as a test but the same result from the logs. >> >> When looking at the stack via cli to see the failed stacks this is what I >> see there: http://paste.openstack.org/show/795736/ >> >> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu >> and 2GB memory. >> Storage is only via cinder as I am using iscsi storage with a cinder >> driver. I dont have any other storage. >> >> On the master, after the failure the heat log repeats these logs: >> >> ++ curl --silent http://127.0.0.1:8080/healthz >> + '[' ok = ok ']' >> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch >> '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}' >> error: no configuration has been provided, try setting KUBERNETES_MASTER >> environment variable >> Trying to label master node with node-role.kubernetes.io/master="" >> + echo 'Trying to label master node with node-role.kubernetes.io/master= >> ""' >> + sleep 5s >> >> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ >> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ >> >> May I ask if anyone has a recent deployment of Magnum and a working >> deployment of kubernetes that could share with me the relevant details like >> the image you have used so that I can try and replicate? >> >> To create the cluster template I have been using: >> openstack coe cluster template create k8s-cluster-template \ >> --image Fedora-Atomic-27 \ >> --keypair testpair \ >> --external-network physnet2vlan20 \ >> --dns-nameserver 192.168.7.233 \ >> --flavor 2GB-2vCPU \ >> --docker-volume-size 15 \ >> --network-driver flannel \ >> --coe kubernetes >> >> >> If I have missed anything, I am happy to provide it. >> >> Many thanks in advance for any help or pointers on this. >> >> Regards, >> >> Tony Pearce >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jul 13 17:37:57 2020 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 13 Jul 2020 13:37:57 -0400 Subject: [tc] weekly update Message-ID: Hi everyone, Here’s an update for what happened in the OpenStack TC this week. You can get more information by checking for changes in openstack/governance repository. We've also included a few references to some important mailing list threads that you should check out. # Patches ## Open Reviews - [manila] assert:supports-accessible-upgrade https://review.opendev.org/740509 - Add legacy repository validation https://review.opendev.org/737559 - Cleanup the remaining osf repos and their data https://review.opendev.org/739291 - [draft] Add assert:supports-standalone https://review.opendev.org/722399 ## General Changes - Create starter-kit:kubernetes-in-virt tag https://review.opendev.org/736369 - Update goal selection docs to clarify the goal count https://review.opendev.org/739150 - Add "tc:approved-release" tag to manila https://review.opendev.org/738105 # Email Threads - New Office Hours: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015761.html - Summit CFP Open: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015730.html # Other Reminders - OpenStack's 10th anniversary community meeting should be happening July 16th: more info coming soon! - If you're an operator, make sure you fill out our user survey: https://www.openstack.org/user-survey/survey-2020/ - Milestone 2 coming at the end of the month Thanks for reading! Mohammed & Kendall -- Mohammed Naser VEXXHOST, Inc. From rfolco at redhat.com Mon Jul 13 17:42:16 2020 From: rfolco at redhat.com (Rafael Folco) Date: Mon, 13 Jul 2020 14:42:16 -0300 Subject: [tripleo] TripleO CI Summary: Unified Sprint 29 Message-ID: Greetings, The TripleO CI team has just completed **Unified Sprint 29** (June 18 thru July 08). The following is a summary of completed work during this sprint cycle [1]: - Continued building internal component and integration pipelines. - Designed new promoter tests to run on Python3 to reuse common code and adapt molecule scenarios to the new test sequence standard. Design doc can be found at https://hackmd.io/kJqHSTWWRMOIfIhvDMGFLg. - Important changes to the next-generation promoter have been submitted, e.g. QCOW2 promotions https://review.rdoproject.org/r/#/c/27626/. - CentOS8 component and integration pipelines are still in progress to be completed in the next sprint cycle. - Tempest skip list and ironic plugin general improvements. - Ruck/Rover recorded notes [2]. The planned work for the next sprint [3] extends the work started in the previous sprint and focuses on switching container build jobs to the new building system. The Ruck and Rover for this sprint are Sandeep Yadav (ysandeep), Sorin Sbarnea (zbr). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes to be tracked in etherpad [4]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-29 [2] https://hackmd.io/XcuH2OIVTMiuxyrqSF6ocw [3] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-30 [4] https://hackmd.io/6Bx0FXwlRNCc75l39NSKvg -- Folco -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Mon Jul 13 18:01:16 2020 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 13 Jul 2020 14:01:16 -0400 Subject: [neutron] Bug deputy report for week of July 6th Message-ID: Hi, I was Neutron bug deputy last week. Below is a short summary about reported bugs. -Brian Critical bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1886807 - neutron-ovn-tempest-full-multinode-ovs-master job is failing 100% times - Gate failure High bugs --------- * https://bugs.launchpad.net/neutron/+bug/1886956 - Functional test test_restart_wsgi_on_sighup_multiple_workers is failing sometimes - https://review.opendev.org/#/c/740283/ * https://bugs.launchpad.net/neutron/+bug/1886969 - dhcp bulk reload fails with python3 - needs owner * https://bugs.launchpad.net/neutron/+bug/1887148 - Network loop between physical networks with DVR - https://review.opendev.org/#/c/740724/ proposed Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/1886909 - selection_fields for udp and sctp case doesn't work correctly - This is actually a bug in core OVN, and fixed in v20.06.1, should bump to test with a later version - Also related to supporting SCTP w/Octavia and adding that support to the ovn-octavia-provider driver * https://bugs.launchpad.net/neutron/+bug/1886962 - [OVN][QOS] NBDB qos table entries still exist even after corresponding neutron ports are deleted - needs owner * https://bugs.launchpad.net/neutron/+bug/1887108 - wrong l2pop flows on vlan network - asked for more information on config - needs owner * https://bugs.launchpad.net/neutron/+bug/1887163 - Failed to create network or port with dns_domain parameter - possible config issue with two dns extensions loaded at same time Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/1887147 - neutron-linuxbridge-agent looping same as dhcp - actually looks like a configuration issue in the deployment as privsep helper isn't able to start properly Wishlist bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1886798 - [RFE] Port NUMA affinity policy - Port object update for Nova scheduling - Needs discussion in Drivers meeting Further triage required ----------------------- * https://bugs.launchpad.net/neutron/+bug/1886426 - Neutron sending response to rabbitmq exchange with event type "floatingip.update.end" without updating the status of Floating IP - Asked for more information * https://bugs.launchpad.net/neutron/+bug/1886949 - [RFE] Granular metering data in neutron-metering-agent - Update to metering agent - Asked Slawek to take a look as there was talk about deprecating the metering agent From radoslaw.piliszek at gmail.com Mon Jul 13 19:53:03 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Mon, 13 Jul 2020 21:53:03 +0200 Subject: [masakari] Meetings Message-ID: Hello Fellow cloud-HA-seekers, I wanted to attend Masakari meetings but I found the current schedule unfit. Is there a chance to change the schedule? The day is fine but a shift by +3 hours would be nice. Anyhow, I wanted to discuss [1]. I've already proposed a change implementing it and looking forward to positive reviews. :-) That said, please reply on the change directly, or mail me or catch me on IRC, whichever option sounds best to you. [1] https://blueprints.launchpad.net/masakari/+spec/customisable-ha-enabled-instance-metadata-key -yoctozepto From amy at demarco.com Mon Jul 13 23:19:28 2020 From: amy at demarco.com (Amy Marrich) Date: Mon, 13 Jul 2020 18:19:28 -0500 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Hey Tom, Adding the OpenStack discuss list as I think you got several replies from there as well. Thanks, Amy (spotz) On Mon, Jul 13, 2020 at 5:37 PM Thomas King wrote: > Good day, > > I'm bringing up a thread from June about DHCP relay with neutron networks > in Ironic, specifically using unicast relay. The Triple-O docs do not have > the plain config/neutron config to show how a regular Ironic setup would > use DHCP relay. > > The Neutron segments docs state that I must have a unique physical network > name. If my Ironic controller has a single provisioning network with a > single physical network name, doesn't this prevent my use of multiple > segments? > > Further, the segments docs state this: "The operator must ensure that > every compute host that is supposed to participate in a router provider > network has direct connectivity to one of its segments." (section 3 at > https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - > current docs state the same thing) > This defeats the purpose of using DHCP relay, though, where the Ironic > controller does *not* have direct connectivity to the remote segment. > > Here is a rough drawing - what is wrong with my thinking here? > Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay > <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN > 2115 > > Thank you, > Tom King > _______________________________________________ > openstack-mentoring mailing list > openstack-mentoring at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Tue Jul 14 00:04:16 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Mon, 13 Jul 2020 17:04:16 -0700 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi Fabian, Sorry you have run into trouble and we have missed you in the IRC channel. Yeah, that transcript from three years ago isn't going to be much help. A few things we will want to know are: 1. What version of Octavia are you using? 2. Do you have the DNS extension to neutron enabled? 3. When it said "unable to attach port to amphora", can you provide the full error? Was it due to a hostname mismatch error from nova? My guess is you ran into the issue where a port will not attach if the DNS name doesn't match. Our workaround for that accidentally got removed and re-added in https://review.opendev.org/#/c/663277/. Replacing a vrrp_port is tricky, so I'm not surprised you ran into some trouble. Can you please provide the controller worker log output when doing a load balancer failover (let's not use amphora failover here) on paste.openstack.org? You can mark it private and directly reply to me if you have concerns about the log content. All this said, I have recently completely refactored the failover flows recently. This has already merged on the master branch and backports are in process. Michael On Fri, Jul 10, 2020 at 7:07 AM Fabian Zimmermann wrote: > > Hi, > > we had some network issues and now have amphoras which are marked in ERROR state. > > What we already tried: > > - failover the amphora > - failover the loadbalancer > > both did not work, got "unable to attach port to (new) amphora". > > Then we removed the vrrp_port, set the vrrp_port_id to NULL and repeated the amphora failover > > Reverting Err: "PortID: Null" > > Then we created a new vrrp_port as described [1] and added the port-id to the vrrp_port_id and the a suitable vrrp_ip field to our ERRORed amphora entry. > > Restarted failover -> without luck. > > Currently we have an single STANDALONE amphora configured. > > Is there a way to trigger octavia to create new "clean" amphoras for MASTER/BACKUP? > > Thanks, > > Fabian > > [1]http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45 From thomas.king at gmail.com Mon Jul 13 23:21:59 2020 From: thomas.king at gmail.com (Thomas King) Date: Mon, 13 Jul 2020 17:21:59 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Thank you, Amy! Tom On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: > Hey Tom, > > Adding the OpenStack discuss list as I think you got several replies from > there as well. > > Thanks, > > Amy (spotz) > > On Mon, Jul 13, 2020 at 5:37 PM Thomas King wrote: > >> Good day, >> >> I'm bringing up a thread from June about DHCP relay with neutron networks >> in Ironic, specifically using unicast relay. The Triple-O docs do not have >> the plain config/neutron config to show how a regular Ironic setup would >> use DHCP relay. >> >> The Neutron segments docs state that I must have a unique physical >> network name. If my Ironic controller has a single provisioning network >> with a single physical network name, doesn't this prevent my use of >> multiple segments? >> >> Further, the segments docs state this: "The operator must ensure that >> every compute host that is supposed to participate in a router provider >> network has direct connectivity to one of its segments." (section 3 at >> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >> current docs state the same thing) >> This defeats the purpose of using DHCP relay, though, where the Ironic >> controller does *not* have direct connectivity to the remote segment. >> >> Here is a rough drawing - what is wrong with my thinking here? >> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >> <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN >> 2115 >> >> Thank you, >> Tom King >> _______________________________________________ >> openstack-mentoring mailing list >> openstack-mentoring at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yan.y.zhao at intel.com Mon Jul 13 23:29:57 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Tue, 14 Jul 2020 07:29:57 +0800 Subject: device compatibility interface for live migration with assigned devices Message-ID: <20200713232957.GD5955@joy-OptiPlex-7040> hi folks, we are defining a device migration compatibility interface that helps upper layer stack like openstack/ovirt/libvirt to check if two devices are live migration compatible. The "devices" here could be MDEVs, physical devices, or hybrid of the two. e.g. we could use it to check whether - a src MDEV can migrate to a target MDEV, - a src VF in SRIOV can migrate to a target VF in SRIOV, - a src MDEV can migration to a target VF in SRIOV. (e.g. SIOV/SRIOV backward compatibility case) The upper layer stack could use this interface as the last step to check if one device is able to migrate to another device before triggering a real live migration procedure. we are not sure if this interface is of value or help to you. please don't hesitate to drop your valuable comments. (1) interface definition The interface is defined in below way: __ userspace /\ \ / \write / read \ ________/__________ ___\|/_____________ | migration_version | | migration_version |-->check migration --------------------- --------------------- compatibility device A device B a device attribute named migration_version is defined under each device's sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). userspace tools read the migration_version as a string from the source device, and write it to the migration_version sysfs attribute in the target device. The userspace should treat ANY of below conditions as two devices not compatible: - any one of the two devices does not have a migration_version attribute - error when reading from migration_version attribute of one device - error when writing migration_version string of one device to migration_version attribute of the other device The string read from migration_version attribute is defined by device vendor driver and is completely opaque to the userspace. for a Intel vGPU, string format can be defined like "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". for an NVMe VF connecting to a remote storage. it could be "PCI ID" + "driver version" + "configured remote storage URL" for a QAT VF, it may be "PCI ID" + "driver version" + "supported encryption set". (to avoid namespace confliction from each vendor, we may prefix a driver name to each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) (2) backgrounds The reason we hope the migration_version string is opaque to the userspace is that it is hard to generalize standard comparing fields and comparing methods for different devices from different vendors. Though userspace now could still do a simple string compare to check if two devices are compatible, and result should also be right, it's still too limited as it excludes the possible candidate whose migration_version string fails to be equal. e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible with another MDEV with mdev_type_3, aggregator count 1, even their migration_version strings are not equal. (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). besides that, driver version + configured resources are all elements demanding to take into account. So, we hope leaving the freedom to vendor driver and let it make the final decision in a simple reading from source side and writing for test in the target side way. we then think the device compatibility issues for live migration with assigned devices can be divided into two steps: a. management tools filter out possible migration target devices. Tags could be created according to info from product specification. we think openstack/ovirt may have vendor proprietary components to create those customized tags for each product from each vendor. e.g. for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to search target vGPU are like: a tag for compatible parent PCI IDs, a tag for a range of gvt driver versions, a tag for a range of mdev type + aggregator count for NVMe VF, the tags to search target VF may be like: a tag for compatible PCI IDs, a tag for a range of driver versions, a tag for URL of configured remote storage. b. with the output from step a, openstack/ovirt/libvirt could use our proposed device migration compatibility interface to make sure the two devices are indeed live migration compatible before launching the real live migration process to start stream copying, src device stopping and target device resuming. It is supposed that this step would not bring any performance penalty as -in kernel it's just a simple string decoding and comparing -in openstack/ovirt, it could be done by extending current function check_can_live_migrate_destination, along side claiming target resources.[1] [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html Thanks Yan From mthode at mthode.org Tue Jul 14 04:10:46 2020 From: mthode at mthode.org (Matthew Thode) Date: Mon, 13 Jul 2020 23:10:46 -0500 Subject: Setuptools 48 and Devstack Failures In-Reply-To: <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> References: <91325864-5995-4cf8-ab22-ab0fe3fdd353@www.fastmail.com> <17316cc5b56.1069abf83419719.5856946506321936982@ghanshyammann.com> <1731c9381f9.c3ec7029419955.5239287898505413558@ghanshyammann.com> Message-ID: <20200714041046.hzxrorrisrhdnhrv@mthode.org> On 20-07-04 20:24:55, Ghanshyam Mann wrote: > ---- On Fri, 03 Jul 2020 17:29:18 -0500 Ghanshyam Mann wrote ---- > > ---- On Fri, 03 Jul 2020 14:13:04 -0500 Clark Boylan wrote ---- > > > Hello, > > > > > > Setuptools has made a new version 48 release. This appears to be causing problems for devstack because `pip install -e $PACKAGE_PATH` installs commands to /usr/bin and not /usr/local/bin on Ubuntu as it did in the past. `pip install $PACKAGE_PATH` continues to install to /usr/local/bin as expected. Devstack is failing because keystone-manage cannot currently be found at the specific /usr/local/bin/ path. > > > > > > Potential workarounds for this include not using `pip install -e` or relying on $PATH to find the commands rather than specifying rooted paths to them. I'll defer to the QA team on how they want to address this. While we can have devstack install an older setuptools version as well, generally this is not considered to be a good idea because anyone doing pip installs outside of devstack may get the newer behavior. It is actually important for us to try and keep up with setuptools changes as a result. > > > > > > Fungi indicated that setuptools expected this to be a bumpy upgrade. I'm not sure if they would consider `pip install -e` and `pip install` installing to different paths as a bug, and if they did which behavior is correct. It would probably be a good idea to file a bug upstream if we debug this further. > > > > Yeah, I am not sure how it will go as setuptools bug or an incompatible change and needs to handle on devstack side. > > As this is blocking all gates, let's use the old setuptools temporarily. For now, I filed devstack bug to track > > it and once we figure it out then move to latest setuptools - https://bugs.launchpad.net/devstack/+bug/1886237 > > > > This is patch to use old setuptools- > > - https://review.opendev.org/#/c/739290/ > > Updates: > Issue is when setuptools adopts distutils from the standard library (in 48.0.0) and uses it, downstream packagers customization to distutils will be lost. > - https://github.com/pypa/setuptools/issues/2232 > > setuptools 49.1.0 reverted the adoption of distutils from the standard library and its working now. > > I have closed the devstack bug 1886237 and proposed the revert of capping of setuptools by blacklisting 48.0.0 and 49.0.0 so > that we test with latest setuptools. For now, devstack will pick the 49.1.0 and pass. > - https://review.opendev.org/#/c/739294/2 > > In summary, gate is green and you can recheck on the failed patches. > It looks like they (upstream) are rolling forward with the change. There are workarounds for those that need it (env var). Please see the above linked issue for more information. > -gmann > > > > > > > > Clark > > > > > > > > > > > -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From ruslanas at lpic.lt Tue Jul 14 09:01:03 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 11:01:03 +0200 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: hi, have you checked: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html ? I am following this link. I only have one network, having different issues tho ;) On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: > Thank you, Amy! > > Tom > > On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: > >> Hey Tom, >> >> Adding the OpenStack discuss list as I think you got several replies from >> there as well. >> >> Thanks, >> >> Amy (spotz) >> >> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >> wrote: >> >>> Good day, >>> >>> I'm bringing up a thread from June about DHCP relay with neutron >>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>> not have the plain config/neutron config to show how a regular Ironic setup >>> would use DHCP relay. >>> >>> The Neutron segments docs state that I must have a unique physical >>> network name. If my Ironic controller has a single provisioning network >>> with a single physical network name, doesn't this prevent my use of >>> multiple segments? >>> >>> Further, the segments docs state this: "The operator must ensure that >>> every compute host that is supposed to participate in a router provider >>> network has direct connectivity to one of its segments." (section 3 at >>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>> current docs state the same thing) >>> This defeats the purpose of using DHCP relay, though, where the Ironic >>> controller does *not* have direct connectivity to the remote segment. >>> >>> Here is a rough drawing - what is wrong with my thinking here? >>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >>> <------> Ironic controller, provisioning network: 10.146.29.192/26 VLAN >>> 2115 >>> >>> Thank you, >>> Tom King >>> _______________________________________________ >>> openstack-mentoring mailing list >>> openstack-mentoring at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>> >> -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Tue Jul 14 12:26:11 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 14:26:11 +0200 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others Message-ID: Hi all, Borry to keep spamming you all the time. But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: - tuned ssh - automatically generated root pass to the one I need - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. - and so on. I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) Thank you in advance. -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From reza.b2008 at gmail.com Tue Jul 14 13:06:22 2020 From: reza.b2008 at gmail.com (Reza Bakhshayeshi) Date: Tue, 14 Jul 2020 17:36:22 +0430 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: Thanks for your information. Actually, I was in doubt of using Ussuri (latest version) for my environment. Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but overcloud image build got some error: $ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml ... 2020-07-14 12:14:22.714 | Running install-packages install. 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync 2020-07-14 12:14:23.252 | DNF version: 4.2.17 2020-07-14 12:14:23.253 | cachedir: /tmp/yum 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux 8; generic; Linux.x86_64)' 2020-07-14 12:14:23.472 | repo: using cache for: AppStream 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 23:25:16 2020. 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 23:25:12 2020. 2020-07-14 12:14:23.517 | repo: using cache for: extras 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 00:15:26 2020. 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on Tue Jul 14 11:43:38 2020. 2020-07-14 12:14:23.767 | Completion plugin: Generating completion cache... 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is already installed. 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already installed. 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote 2020-07-14 12:14:23.929 | No match for argument: osops-tools-monitoring-oschecks 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker 2020-07-14 12:14:23.936 | No match for argument: crudini 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux 2020-07-14 12:14:23.953 | No match for argument: pacemaker 2020-07-14 12:14:23.957 | No match for argument: pcs 2020-07-14 12:14:23.961 | Error: Unable to find a match: python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs Do you have any idea? On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: > Hi folks, > > On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: > >> I don't believe centos8 containers are available for Train yet. The >> error you're hitting is because it's fetching centos7 containers and >> the ironic container is not backwards compatible between the two >> versions. If you want centos8, use Ussuri. >> >> > fyi we started pushing centos8 train last week - slightly different > namespace - latest current-tripleo containers are pushed to > https://hub.docker.com/u/tripleotraincentos8 > > hope it helps > > >> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi >> wrote: >> > >> > I found following error in ironic and container-puppet-ironic container >> log during installation: >> > >> > puppet-user: Error: >> /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: >> Could not evaluate: Could not retrieve information from environment >> production source(s) file:/tftpboot/ldlinux.c32 >> > >> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi >> wrote: >> >> >> >> Hi, >> >> >> >> I'm going to install OpenStack Train with the help of TripleO on >> CentOS 8, but undercloud installation fails with the following error: >> >> >> >> "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: >> Skipping because of failed dependencies", "puppet-user: Warning: >> /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen >> 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping >> because of failed dependencies", "puppet-user: Notice: Applied catalog in >> 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: >> 97", "puppet-user: Events:", "puppet-user: Failure: 1", >> "puppet-user: Success: 97", "puppet-user: Total: 98", >> "puppet-user: Resources:", "puppet-user: Failed: 1", >> "puppet-user: Skipped: 41", "puppet-user: Changed: 97", >> "puppet-user: Out of sync: 98", "puppet-user: Total: >> 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", >> "puppet-user: Concat file: 0.00", "puppet-user: Anchor: >> 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: >> Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: >> Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", >> "puppet-user: Catalog application: 1.72", "puppet-user: Last >> run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: >> Total: 1.72", "puppet-user: Version:", "puppet-user: >> Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ >> '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit >> 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying >> running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed >> running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- >> Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 >> ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: >> 95117 -- ERROR configuring zaqar"]} >> >> >> >> Any suggestion would be grateful. >> >> Regards, >> >> Reza >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 13:11:33 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:11:33 -0600 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi wrote: > > Thanks for your information. > Actually, I was in doubt of using Ussuri (latest version) for my environment. > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but overcloud image build got some error: > > $ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > > ... > 2020-07-14 12:14:22.714 | Running install-packages install. > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux 8; generic; Linux.x86_64)' > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 23:25:16 2020. > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 23:25:12 2020. > 2020-07-14 12:14:23.517 | repo: using cache for: extras > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 00:15:26 2020. > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on Tue Jul 14 11:43:38 2020. > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion cache... > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is already installed. > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already installed. > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote > 2020-07-14 12:14:23.929 | No match for argument: osops-tools-monitoring-oschecks > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker > 2020-07-14 12:14:23.936 | No match for argument: crudini > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux > 2020-07-14 12:14:23.953 | No match for argument: pacemaker > 2020-07-14 12:14:23.957 | No match for argument: pcs > 2020-07-14 12:14:23.961 | Error: Unable to find a match: python3-aodhclient python3-barbicanclient python3-cinderclient python3-designateclient python3-glanceclient python3-gnocchiclient python3-heatclient python3-ironicclient python3-keystoneclient python3-manilaclient python3-mistralclient python3-neutronclient python3-novaclient python3-openstackclient python3-pankoclient python3-saharaclient python3-swiftclient python3-zaqarclient pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs > > Do you have any idea? > Seems like you are missing the correct DIP_YUM_REPO_CONF setting per #3 from https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images > > > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: >> >> Hi folks, >> >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz wrote: >>> >>> I don't believe centos8 containers are available for Train yet. The >>> error you're hitting is because it's fetching centos7 containers and >>> the ironic container is not backwards compatible between the two >>> versions. If you want centos8, use Ussuri. >>> >> >> fyi we started pushing centos8 train last week - slightly different namespace - latest current-tripleo containers are pushed to https://hub.docker.com/u/tripleotraincentos8 >> >> hope it helps >> >>> >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi wrote: >>> > >>> > I found following error in ironic and container-puppet-ironic container log during installation: >>> > >>> > puppet-user: Error: /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: Could not evaluate: Could not retrieve information from environment production source(s) file:/tftpboot/ldlinux.c32 >>> > >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm going to install OpenStack Train with the help of TripleO on CentOS 8, but undercloud installation fails with the following error: >>> >> >>> >> "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: Skipping because of failed dependencies", "puppet-user: Warning: /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping because of failed dependencies", "puppet-user: Notice: Applied catalog in 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: 97", "puppet-user: Events:", "puppet-user: Failure: 1", "puppet-user: Success: 97", "puppet-user: Total: 98", "puppet-user: Resources:", "puppet-user: Failed: 1", "puppet-user: Skipped: 41", "puppet-user: Changed: 97", "puppet-user: Out of sync: 98", "puppet-user: Total: 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", "puppet-user: Concat file: 0.00", "puppet-user: Anchor: 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", "puppet-user: Catalog application: 1.72", "puppet-user: Last run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: Total: 1.72", "puppet-user: Version:", "puppet-user: Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: 95117 -- ERROR configuring zaqar"]} >>> >> >>> >> Any suggestion would be grateful. >>> >> Regards, >>> >> Reza >>> >> >>> >> >>> >>> From aschultz at redhat.com Tue Jul 14 13:22:34 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:22:34 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: > > Hi all, > > Borry to keep spamming you all the time. > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: These don't necessarily need to be done in the image itself but you can virt customize the image prior to uploading it to the undercloud to inject some things. We provide ways of configuring these things at deployment time. > - tuned ssh We have sshd configured via a service. Available options are listed in the service file: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > - automatically generated root pass to the one I need This can be done via a firstboot script. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) You'd probably want to do this via a first boot as well. If you are deploying with overcloud images, technically you shouldn't need a proxy on install but you'd likely need one for subsequent updates. > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. See KernelArgs. https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > - and so on. > Your best reference for what is available is likely going to be by looking in the THT/deployment folder for the service configurations. We don't expose everything but we do allow configurability for a significant amount of options. *ExtraConfig can be used to tweak additional options that we don't necessarily expose directly if you know what options need to be set via the appropriate puppet modules. If there are services we don't actually configure, you can define your own custom tripleo service templates and add them to the roles to do whatever you want. > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) > > Thank you in advance. > > -- > Ruslanas Gžibovskis > +370 6030 7030 From aschultz at redhat.com Tue Jul 14 13:29:11 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:29:11 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:22 AM Alex Schultz wrote: > > On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: > > > > Hi all, > > > > Borry to keep spamming you all the time. > > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: > > These don't necessarily need to be done in the image itself but you > can virt customize the image prior to uploading it to the undercloud > to inject some things. We provide ways of configuring these things at > deployment time. > > > - tuned ssh > > We have sshd configured via a service. Available options are listed in > the service file: > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > > > - automatically generated root pass to the one I need > > This can be done via a firstboot script. > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html Forgot to include this but we ship an example specifically for this: https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/firstboot/userdata_root_password.yaml > > > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) > > You'd probably want to do this via a first boot as well. If you are > deploying with overcloud images, technically you shouldn't need a > proxy on install but you'd likely need one for subsequent updates. > > > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. > > See KernelArgs. > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > > > - and so on. > > > > Your best reference for what is available is likely going to be by > looking in the THT/deployment folder for the service configurations. > We don't expose everything but we do allow configurability for a > significant amount of options. *ExtraConfig can be used to tweak > additional options that we don't necessarily expose directly if you > know what options need to be set via the appropriate puppet modules. > If there are services we don't actually configure, you can define your > own custom tripleo service templates and add them to the roles to do > whatever you want. > > > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) > > > > Thank you in advance. > > > > -- > > Ruslanas Gžibovskis > > +370 6030 7030 From emilien at redhat.com Tue Jul 14 13:30:00 2020 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 14 Jul 2020 09:30:00 -0400 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core Message-ID: Hi folks, Rabi has proved deep technical understanding on the TripleO components over the last years. Initially as a major maintainer of the Heat project and then a regular contributor to TripleO, he got involved at different levels: - Optimization of the Heat templates, to reduce the number of resources or improve them to make it faster and more efficient at scale. - Migration of the Mistral workflows into native Ansible modules and Python code into tripleo-common, with end-to-end expertise. - Regular contributions to the container tooling integration. Being involved on the mailing-list and IRC channels, Rabi is always helpful to the community and here to help. He has provided thorough reviews in principal components on TripleO as well as a lot of bug fixes or new features; which contributed to make TripleO more stable and scalable. I would like to propose him be part of the TripleO core team. Thanks Rabi for your hard work! -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Tue Jul 14 13:37:35 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 16:37:35 +0300 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: Thank you Alex. I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: > On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis > wrote: > > > > Hi all, > > > > Borry to keep spamming you all the time. > > But could you help me to find a correct place to "modify" image content > (packages installed and not installed) and files and services configured in > an "adjusted" way so I would have for example: > > These don't necessarily need to be done in the image itself but you > can virt customize the image prior to uploading it to the undercloud > to inject some things. We provide ways of configuring these things at > deployment time. > > > - tuned ssh > > We have sshd configured via a service. Available options are listed in > the service file: > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > > > - automatically generated root pass to the one I need > > This can be done via a firstboot script. > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > > > - Also added proxy config to /etc/yum.conf to certain computes, and > other would be used without proxy (maybe extraconfig option?) > > You'd probably want to do this via a first boot as well. If you are > deploying with overcloud images, technically you shouldn't need a > proxy on install but you'd likely need one for subsequent updates. > > > - set up kernel parameters, so I would have console output duplicated > to serial connection and to iDRAC serial, so I could see login screen over > idrac ssh. > > See KernelArgs. > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > > > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > > > - and so on. > > > > Your best reference for what is available is likely going to be by > looking in the THT/deployment folder for the service configurations. > We don't expose everything but we do allow configurability for a > significant amount of options. *ExtraConfig can be used to tweak > additional options that we don't necessarily expose directly if you > know what options need to be set via the appropriate puppet modules. > If there are services we don't actually configure, you can define your > own custom tripleo service templates and add them to the roles to do > whatever you want. > > > I believe many of those things can be done over extraconfig, I just do > not know options to modify. maybe you can point me like a blind hen into a > correct bowl? :))) > > > > Thank you in advance. > > > > -- > > Ruslanas Gžibovskis > > +370 6030 7030 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 13:44:44 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:44:44 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis wrote: > > Thank you Alex. > > I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. > For Ussuri it's still available. In future versions we'll be switching out how we provision nodes which means the firstboot interface likely will go away and be replaced with something else during provisioning. However it's still currently valid. > For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) virt-customize is probably the easiest thing to just inject something unmanaged into the environment. You can technically use an AllNodesExtraConfig (example THT/environment/enable-swap.yaml & THT/extraconfig/all_nodes/swap.yaml) to do some custom script at installation time as well to manage the files. However this uses a Heat SoftwareConfig which is also deprecated. Though i'm not certain we have an official replacement for that yet. > > yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. > > regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. > > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: >> >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: >> > >> > Hi all, >> > >> > Borry to keep spamming you all the time. >> > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: >> >> These don't necessarily need to be done in the image itself but you >> can virt customize the image prior to uploading it to the undercloud >> to inject some things. We provide ways of configuring these things at >> deployment time. >> >> > - tuned ssh >> >> We have sshd configured via a service. Available options are listed in >> the service file: >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml >> >> > - automatically generated root pass to the one I need >> >> This can be done via a firstboot script. >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html >> >> > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) >> >> You'd probably want to do this via a first boot as well. If you are >> deploying with overcloud images, technically you shouldn't need a >> proxy on install but you'd likely need one for subsequent updates. >> >> > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. >> >> See KernelArgs. >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 >> >> https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e >> >> > - and so on. >> > >> >> Your best reference for what is available is likely going to be by >> looking in the THT/deployment folder for the service configurations. >> We don't expose everything but we do allow configurability for a >> significant amount of options. *ExtraConfig can be used to tweak >> additional options that we don't necessarily expose directly if you >> know what options need to be set via the appropriate puppet modules. >> If there are services we don't actually configure, you can define your >> own custom tripleo service templates and add them to the roles to do >> whatever you want. >> >> > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) >> > >> > Thank you in advance. >> > >> > -- >> > Ruslanas Gžibovskis >> > +370 6030 7030 >> From aschultz at redhat.com Tue Jul 14 13:45:15 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 07:45:15 -0600 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 On Tue, Jul 14, 2020 at 7:39 AM Emilien Macchi wrote: > > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components over the last years. > Initially as a major maintainer of the Heat project and then a regular contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as well as a lot of bug fixes or new features; which contributed to make TripleO more stable and scalable. I would like to propose him be part of the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi From ruslanas at lpic.lt Tue Jul 14 13:50:12 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 15:50:12 +0200 Subject: [TripleO] [Train] CentOS 8: Undercloud installation fails In-Reply-To: References: Message-ID: I am not sure, but that might help. I use these steps for deployment: cp -ar /etc/yum.repos.d repos sed -i s/gpgcheck=1/gpgcheck=0/g repos/*repo export DIB_YUM_REPO_CONF="$(ls /home/stack/repos/*repo)" export STABLE_RELEASE="ussuri" export OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml" source /home/stack/stackrc mkdir /home/stack/images cd /home/stack/images openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml && openstack overcloud image upload --update-existing cd /home/stack ls /home/stack/images this works for all packages except: pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini openstack-selinux pacemaker pcs to solve these you need to enable in repos dir HA repo (change in enable=0 to enable=1 and then this will solve you issues with most except: osops-tools-monitoring-oschecks this one, you can change by: modify line in file: /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map to have this line: "oschecks_package": "sysstat" instead of "oschecks_package": "osops-tools-monitoring-oschecks " On Tue, 14 Jul 2020 at 15:14, Alex Schultz wrote: > On Tue, Jul 14, 2020 at 7:06 AM Reza Bakhshayeshi > wrote: > > > > Thanks for your information. > > Actually, I was in doubt of using Ussuri (latest version) for my > environment. > > Anyway, Undercloud Ussuri installed like a charm on CentOS 8, but > overcloud image build got some error: > > > > $ openstack overcloud image build --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-python3.yaml > --config-file > /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos8.yaml > > > > ... > > 2020-07-14 12:14:22.714 | Running install-packages install. > > 2020-07-14 12:14:22.714 | + dnf -v -y install python3-aodhclient > python3-barbicanclient python3-cinderclient python3-designateclient > python3-glanceclient python3-gnocchiclient python3-heatclient > python3-ironicclient python3-keystoneclient python3-manilaclient > python3-mistralclient python3-neutronclient python3-novaclient > python3-openstackclient python3-pankoclient python3-saharaclient > python3-swiftclient python3-zaqarclient dpdk driverctl nfs-utils chrony > pacemaker-remote cyrus-sasl-scram tuned-profiles-cpu-partitioning > osops-tools-monitoring-oschecks aide ansible-pacemaker crudini gdisk podman > libreswan openstack-selinux net-snmp numactl iptables-services tmpwatch > openssl-perl lvm2 chrony certmonger fence-agents-all fence-virt > ipa-admintools ipa-client ipxe-bootimgs nfs-utils chrony pacemaker pcs > > 2020-07-14 12:14:23.251 | Loaded plugins: builddep, changelog, > config-manager, copr, debug, debuginfo-install, download, > generate_completion_cache, needs-restarting, playground, repoclosure, > repodiff, repograph, repomanage, reposync > > 2020-07-14 12:14:23.252 | DNF version: 4.2.17 > > 2020-07-14 12:14:23.253 | cachedir: /tmp/yum > > 2020-07-14 12:14:23.278 | User-Agent: constructed: 'libdnf (CentOS Linux > 8; generic; Linux.x86_64)' > > 2020-07-14 12:14:23.472 | repo: using cache for: AppStream > > 2020-07-14 12:14:23.493 | AppStream: using metadata from Tue Jul 7 > 23:25:16 2020. > > 2020-07-14 12:14:23.495 | repo: using cache for: BaseOS > > 2020-07-14 12:14:23.517 | BaseOS: using metadata from Tue Jul 7 > 23:25:12 2020. > > 2020-07-14 12:14:23.517 | repo: using cache for: extras > > 2020-07-14 12:14:23.518 | extras: using metadata from Fri Jun 5 > 00:15:26 2020. > > 2020-07-14 12:14:23.519 | Last metadata expiration check: 0:30:45 ago on > Tue Jul 14 11:43:38 2020. > > 2020-07-14 12:14:23.767 | Completion plugin: Generating completion > cache... > > 2020-07-14 12:14:23.850 | No match for argument: python3-aodhclient > > 2020-07-14 12:14:23.854 | No match for argument: python3-barbicanclient > > 2020-07-14 12:14:23.858 | No match for argument: python3-cinderclient > > 2020-07-14 12:14:23.862 | No match for argument: python3-designateclient > > 2020-07-14 12:14:23.865 | No match for argument: python3-glanceclient > > 2020-07-14 12:14:23.869 | No match for argument: python3-gnocchiclient > > 2020-07-14 12:14:23.873 | No match for argument: python3-heatclient > > 2020-07-14 12:14:23.876 | No match for argument: python3-ironicclient > > 2020-07-14 12:14:23.880 | No match for argument: python3-keystoneclient > > 2020-07-14 12:14:23.884 | No match for argument: python3-manilaclient > > 2020-07-14 12:14:23.887 | No match for argument: python3-mistralclient > > 2020-07-14 12:14:23.891 | No match for argument: python3-neutronclient > > 2020-07-14 12:14:23.895 | No match for argument: python3-novaclient > > 2020-07-14 12:14:23.898 | No match for argument: python3-openstackclient > > 2020-07-14 12:14:23.902 | No match for argument: python3-pankoclient > > 2020-07-14 12:14:23.906 | No match for argument: python3-saharaclient > > 2020-07-14 12:14:23.910 | No match for argument: python3-swiftclient > > 2020-07-14 12:14:23.915 | No match for argument: python3-zaqarclient > > 2020-07-14 12:14:23.920 | Package nfs-utils-1:2.3.3-31.el8.x86_64 is > already installed. > > 2020-07-14 12:14:23.921 | Package chrony-3.5-1.el8.x86_64 is already > installed. > > 2020-07-14 12:14:23.924 | No match for argument: pacemaker-remote > > 2020-07-14 12:14:23.929 | No match for argument: > osops-tools-monitoring-oschecks > > 2020-07-14 12:14:23.933 | No match for argument: ansible-pacemaker > > 2020-07-14 12:14:23.936 | No match for argument: crudini > > 2020-07-14 12:14:23.942 | No match for argument: openstack-selinux > > 2020-07-14 12:14:23.953 | No match for argument: pacemaker > > 2020-07-14 12:14:23.957 | No match for argument: pcs > > 2020-07-14 12:14:23.961 | Error: Unable to find a match: > python3-aodhclient python3-barbicanclient python3-cinderclient > python3-designateclient python3-glanceclient python3-gnocchiclient > python3-heatclient python3-ironicclient python3-keystoneclient > python3-manilaclient python3-mistralclient python3-neutronclient > python3-novaclient python3-openstackclient python3-pankoclient > python3-saharaclient python3-swiftclient python3-zaqarclient > pacemaker-remote osops-tools-monitoring-oschecks ansible-pacemaker crudini > openstack-selinux pacemaker pcs > > > > Do you have any idea? > > > > Seems like you are missing the correct DIP_YUM_REPO_CONF setting per > #3 from > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html#get-images > > > > > > > On Mon, 13 Jul 2020 at 10:50, Marios Andreou wrote: > >> > >> Hi folks, > >> > >> On Mon, Jul 13, 2020 at 12:13 AM Alex Schultz > wrote: > >>> > >>> I don't believe centos8 containers are available for Train yet. The > >>> error you're hitting is because it's fetching centos7 containers and > >>> the ironic container is not backwards compatible between the two > >>> versions. If you want centos8, use Ussuri. > >>> > >> > >> fyi we started pushing centos8 train last week - slightly different > namespace - latest current-tripleo containers are pushed to > https://hub.docker.com/u/tripleotraincentos8 > >> > >> hope it helps > >> > >>> > >>> On Sat, Jul 11, 2020 at 7:03 AM Reza Bakhshayeshi < > reza.b2008 at gmail.com> wrote: > >>> > > >>> > I found following error in ironic and container-puppet-ironic > container log during installation: > >>> > > >>> > puppet-user: Error: > /Stage[main]/Ironic::Pxe/Ironic::Pxe::Tftpboot_file[ldlinux.c32]/File[/var/lib/ironic/tftpboot/ldlinux.c32]: > Could not evaluate: Could not retrieve information from environment > production source(s) file:/tftpboot/ldlinux.c32 > >>> > > >>> > On Wed, 8 Jul 2020 at 16:09, Reza Bakhshayeshi > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> I'm going to install OpenStack Train with the help of TripleO on > CentOS 8, but undercloud installation fails with the following error: > >>> >> > >>> >> "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/Concat_file[10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat[10-zaqar_wsgi.conf]/File[/etc/httpd/conf.d/10-zaqar_wsgi.conf]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-apache-header]/Concat_fragment[zaqar_wsgi-apache-header]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-docroot]/Concat_fragment[zaqar_wsgi-docroot]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-directories]/Concat_fragment[zaqar_wsgi-directories]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-logging]/Concat_fragment[zaqar_wsgi-logging]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-serversignature]/Concat_fragment[zaqar_wsgi-serversignature]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-access_log]/Concat_fragment[zaqar_wsgi-access_log]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-setenv]/Concat_fragment[zaqar_wsgi-setenv]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-wsgi]/Concat_fragment[zaqar_wsgi-wsgi]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-custom_fragment]/Concat_fragment[zaqar_wsgi-custom_fragment]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Concat::Fragment[zaqar_wsgi-file_footer]/Concat_fragment[zaqar_wsgi-file_footer]: > Skipping because of failed dependencies", "puppet-user: Warning: > /Stage[main]/Zaqar::Wsgi::Apache/Openstacklib::Wsgi::Apache[zaqar_wsgi]/Apache::Vhost[zaqar_wsgi]/Apache::Listen[192.168.24.1:8888]/Concat::Fragment[Listen > 192.168.24.1:8888]/Concat_fragment[Listen 192.168.24.1:8888]: Skipping > because of failed dependencies", "puppet-user: Notice: Applied catalog in > 1.72 seconds", "puppet-user: Changes:", "puppet-user: Total: > 97", "puppet-user: Events:", "puppet-user: Failure: 1", > "puppet-user: Success: 97", "puppet-user: Total: 98", > "puppet-user: Resources:", "puppet-user: Failed: 1", > "puppet-user: Skipped: 41", "puppet-user: Changed: 97", > "puppet-user: Out of sync: 98", "puppet-user: Total: > 235", "puppet-user: Time:", "puppet-user: Resources: 0.00", > "puppet-user: Concat file: 0.00", "puppet-user: Anchor: > 0.00", "puppet-user: Concat fragment: 0.00", "puppet-user: > Augeas: 0.03", "puppet-user: File: 0.39", "puppet-user: > Zaqar config: 0.61", "puppet-user: Transaction evaluation: 1.69", > "puppet-user: Catalog application: 1.72", "puppet-user: Last > run: 1594207735", "puppet-user: Config retrieval: 4.14", "puppet-user: > Total: 1.72", "puppet-user: Version:", "puppet-user: > Config: 1594207730", "puppet-user: Puppet: 5.5.10", "+ rc=6", "+ > '[' False = false ']'", "+ set -e", "+ '[' 6 -ne 2 -a 6 -ne 0 ']'", "+ exit > 6", " attempt(s): 3", "2020-07-08 15:59:00,478 WARNING: 95123 -- Retrying > running container: zaqar", "2020-07-08 15:59:00,478 ERROR: 95123 -- Failed > running container for zaqar", "2020-07-08 15:59:00,478 INFO: 95123 -- > Finished processing puppet configs for zaqar", "2020-07-08 15:59:00,482 > ERROR: 95117 -- ERROR configuring ironic", "2020-07-08 15:59:00,484 ERROR: > 95117 -- ERROR configuring zaqar"]} > >>> >> > >>> >> Any suggestion would be grateful. > >>> >> Regards, > >>> >> Reza > >>> >> > >>> >> > >>> > >>> > > > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Tue Jul 14 13:55:53 2020 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Tue, 14 Jul 2020 09:55:53 -0400 Subject: [cinder] cinderlib reviews needed Message-ID: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> cinderlib is on a cycle-trailing release model and the Ussuri release is coming up soon. Because it's still a new project, I thought I'd send a reminder in case it fell off your radar. These patches need to merge before we cut the release: https://review.opendev.org/720553 https://review.opendev.org/738226 https://review.opendev.org/738473 https://review.opendev.org/739190 https://review.opendev.org/738230 https://review.opendev.org/738866 https://review.opendev.org/738472 https://review.opendev.org/738213 They each have a single +2 at the moment, and they are all short, focused patches. cheers, brian From ruslanas at lpic.lt Tue Jul 14 14:33:40 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Tue, 14 Jul 2020 16:33:40 +0200 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: and by the way, this is what I get in overcloud hosts: *cat centos7-rt.repo* [centos7-rt] name=CentOS 7 - Realtime baseurl=http://mirror.centos.org/centos/7/rt/x86_64/ enabled=1 gpgcheck=0 even it has a centos8 running ;) looks like, some hardcoded yaml file is still inplace :) On Tue, 14 Jul 2020 at 15:45, Alex Schultz wrote: > On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis > wrote: > > > > Thank you Alex. > > > > I have read around in this mailinglist, that firstboot will be removed. > So I was curious, what way forward to have in case it is depricated. > > > > For Ussuri it's still available. In future versions we'll be switching > out how we provision nodes which means the firstboot interface likely > will go away and be replaced with something else during provisioning. > However it's still currently valid. > > > For modifying overcloud-full.qcow2 with virt-customise it do not look > nice, would be prety to do it on image generation, not sure where tho... > maybe writing own module might do the trick, but I find it as dirty > workaround :)) > > virt-customize is probably the easiest thing to just inject something > unmanaged into the environment. You can technically use an > AllNodesExtraConfig (example THT/environment/enable-swap.yaml & > THT/extraconfig/all_nodes/swap.yaml) to do some custom script at > installation time as well to manage the files. However this uses a > Heat SoftwareConfig which is also deprecated. Though i'm not certain > we have an official replacement for that yet. > > > > > yes, for osp modules, i know how to use puppet to provide needed values. > I thought this might be for everything. > > > > regarding proxy in certain compute, it needs to do dnf update for > centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am > confused why, but it does so. > > > > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: > >> > >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis > wrote: > >> > > >> > Hi all, > >> > > >> > Borry to keep spamming you all the time. > >> > But could you help me to find a correct place to "modify" image > content (packages installed and not installed) and files and services > configured in an "adjusted" way so I would have for example: > >> > >> These don't necessarily need to be done in the image itself but you > >> can virt customize the image prior to uploading it to the undercloud > >> to inject some things. We provide ways of configuring these things at > >> deployment time. > >> > >> > - tuned ssh > >> > >> We have sshd configured via a service. Available options are listed in > >> the service file: > >> > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml > >> > >> > - automatically generated root pass to the one I need > >> > >> This can be done via a firstboot script. > >> > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html > >> > >> > - Also added proxy config to /etc/yum.conf to certain computes, and > other would be used without proxy (maybe extraconfig option?) > >> > >> You'd probably want to do this via a first boot as well. If you are > >> deploying with overcloud images, technically you shouldn't need a > >> proxy on install but you'd likely need one for subsequent updates. > >> > >> > - set up kernel parameters, so I would have console output > duplicated to serial connection and to iDRAC serial, so I could see login > screen over idrac ssh. > >> > >> See KernelArgs. > >> > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 > >> > >> > https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e > >> > >> > - and so on. > >> > > >> > >> Your best reference for what is available is likely going to be by > >> looking in the THT/deployment folder for the service configurations. > >> We don't expose everything but we do allow configurability for a > >> significant amount of options. *ExtraConfig can be used to tweak > >> additional options that we don't necessarily expose directly if you > >> know what options need to be set via the appropriate puppet modules. > >> If there are services we don't actually configure, you can define your > >> own custom tripleo service templates and add them to the roles to do > >> whatever you want. > >> > >> > I believe many of those things can be done over extraconfig, I just > do not know options to modify. maybe you can point me like a blind hen into > a correct bowl? :))) > >> > > >> > Thank you in advance. > >> > > >> > -- > >> > Ruslanas Gžibovskis > >> > +370 6030 7030 > >> > > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Tue Jul 14 14:37:25 2020 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 14 Jul 2020 08:37:25 -0600 Subject: [TripleO][CentOS8][Ussuri] overcloud-full image creation to add kernel options and proxy and others In-Reply-To: References: Message-ID: https://review.opendev.org/#/c/738154/ On Tue, Jul 14, 2020 at 8:34 AM Ruslanas Gžibovskis wrote: > > and by the way, this is what I get in overcloud hosts: > cat centos7-rt.repo > [centos7-rt] > name=CentOS 7 - Realtime > baseurl=http://mirror.centos.org/centos/7/rt/x86_64/ > enabled=1 > gpgcheck=0 > > even it has a centos8 running ;) > looks like, some hardcoded yaml file is still inplace :) > > On Tue, 14 Jul 2020 at 15:45, Alex Schultz wrote: >> >> On Tue, Jul 14, 2020 at 7:37 AM Ruslanas Gžibovskis wrote: >> > >> > Thank you Alex. >> > >> > I have read around in this mailinglist, that firstboot will be removed. So I was curious, what way forward to have in case it is depricated. >> > >> >> For Ussuri it's still available. In future versions we'll be switching >> out how we provision nodes which means the firstboot interface likely >> will go away and be replaced with something else during provisioning. >> However it's still currently valid. >> >> > For modifying overcloud-full.qcow2 with virt-customise it do not look nice, would be prety to do it on image generation, not sure where tho... maybe writing own module might do the trick, but I find it as dirty workaround :)) >> >> virt-customize is probably the easiest thing to just inject something >> unmanaged into the environment. You can technically use an >> AllNodesExtraConfig (example THT/environment/enable-swap.yaml & >> THT/extraconfig/all_nodes/swap.yaml) to do some custom script at >> installation time as well to manage the files. However this uses a >> Heat SoftwareConfig which is also deprecated. Though i'm not certain >> we have an official replacement for that yet. >> >> > >> > yes, for osp modules, i know how to use puppet to provide needed values. I thought this might be for everything. >> > >> > regarding proxy in certain compute, it needs to do dnf update for centos7-rt repo (yes OS is centos8, but repo it has centos-7)... I am confused why, but it does so. >> > >> > On Tue, 14 Jul 2020, 16:23 Alex Schultz, wrote: >> >> >> >> On Tue, Jul 14, 2020 at 6:32 AM Ruslanas Gžibovskis wrote: >> >> > >> >> > Hi all, >> >> > >> >> > Borry to keep spamming you all the time. >> >> > But could you help me to find a correct place to "modify" image content (packages installed and not installed) and files and services configured in an "adjusted" way so I would have for example: >> >> >> >> These don't necessarily need to be done in the image itself but you >> >> can virt customize the image prior to uploading it to the undercloud >> >> to inject some things. We provide ways of configuring these things at >> >> deployment time. >> >> >> >> > - tuned ssh >> >> >> >> We have sshd configured via a service. Available options are listed in >> >> the service file: >> >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/sshd/sshd-baremetal-puppet.yaml >> >> >> >> > - automatically generated root pass to the one I need >> >> >> >> This can be done via a firstboot script. >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/extra_config.html >> >> >> >> > - Also added proxy config to /etc/yum.conf to certain computes, and other would be used without proxy (maybe extraconfig option?) >> >> >> >> You'd probably want to do this via a first boot as well. If you are >> >> deploying with overcloud images, technically you shouldn't need a >> >> proxy on install but you'd likely need one for subsequent updates. >> >> >> >> > - set up kernel parameters, so I would have console output duplicated to serial connection and to iDRAC serial, so I could see login screen over idrac ssh. >> >> >> >> See KernelArgs. >> >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/ussuri/deployment/kernel/kernel-boot-params-baremetal-ansible.yaml#L35 >> >> >> >> https://opendev.org/openstack/tripleo-heat-templates/commit/a3e4a9063612a617105e318e422d90706e4ed43e >> >> >> >> > - and so on. >> >> > >> >> >> >> Your best reference for what is available is likely going to be by >> >> looking in the THT/deployment folder for the service configurations. >> >> We don't expose everything but we do allow configurability for a >> >> significant amount of options. *ExtraConfig can be used to tweak >> >> additional options that we don't necessarily expose directly if you >> >> know what options need to be set via the appropriate puppet modules. >> >> If there are services we don't actually configure, you can define your >> >> own custom tripleo service templates and add them to the roles to do >> >> whatever you want. >> >> >> >> > I believe many of those things can be done over extraconfig, I just do not know options to modify. maybe you can point me like a blind hen into a correct bowl? :))) >> >> > >> >> > Thank you in advance. >> >> > >> >> > -- >> >> > Ruslanas Gžibovskis >> >> > +370 6030 7030 >> >> >> > > > -- > Ruslanas Gžibovskis > +370 6030 7030 From jungleboyj at gmail.com Tue Jul 14 14:41:59 2020 From: jungleboyj at gmail.com (Jay Bryant) Date: Tue, 14 Jul 2020 09:41:59 -0500 Subject: [cinder] cinderlib reviews needed In-Reply-To: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> References: <57417de5-5ee8-4778-84eb-7ddf81e0e791@gmail.com> Message-ID: Brian, Thanks for highlighting.  I have taken care of most of these. There was just one that I thought should have some more eyes on it. Thanks! Jay On 7/14/2020 8:55 AM, Brian Rosmaita wrote: > cinderlib is on a cycle-trailing release model and the Ussuri release > is coming up soon.  Because it's still a new project, I thought I'd > send a reminder in case it fell off your radar.  These patches need to > merge before we cut the release: > > https://review.opendev.org/720553 > https://review.opendev.org/738226 > https://review.opendev.org/738473 > https://review.opendev.org/739190 > https://review.opendev.org/738230 > https://review.opendev.org/738866 > https://review.opendev.org/738472 > https://review.opendev.org/738213 > > They each have a single +2 at the moment, and they are all short, > focused patches. > > cheers, > brian > > From gagehugo at gmail.com Tue Jul 14 14:54:17 2020 From: gagehugo at gmail.com (Gage Hugo) Date: Tue, 14 Jul 2020 09:54:17 -0500 Subject: [security] Security SIG meeting July 16th 2020 canceled Message-ID: Hello everyone, The security sig meeting this week will be cancelled due to the 10 years of openstack celebration happening at the same time. We will meet next week at the scheduled time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Tue Jul 14 14:58:48 2020 From: marios at redhat.com (Marios Andreou) Date: Tue, 14 Jul 2020 17:58:48 +0300 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1000 I thought he already was core ? used to be? On Tue, Jul 14, 2020 at 4:31 PM Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdobreli at redhat.com Tue Jul 14 15:10:12 2020 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Tue, 14 Jul 2020 17:10:12 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: On 7/14/20 3:30 PM, Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components over the > last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and Python code > into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always helpful to > the community and here to help. > He has provided thorough reviews in principal components on TripleO as well as a > lot of bug fixes or new features; which contributed to make TripleO more stable > and scalable. I would like to propose him be part of the TripleO core team. > > Thanks Rabi for your hard work! +1 > -- > Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando From katalsupriya36 at gmail.com Tue Jul 14 07:53:20 2020 From: katalsupriya36 at gmail.com (supriya katal) Date: Tue, 14 Jul 2020 13:23:20 +0530 Subject: Cloud Computing Resource Message-ID: Hello Team I have checked your sites. https://github.com/openstacknetsdk/openstack.net/wiki/Getting-Started-With-The-OpenStack-NET-SDK For using this sdk, one needs to create an account for *RECKSPACE *open cloud. I have an account in STACKPATH https://control.stackpath.com/ Can I use a stackpath storage object for uploading and accessing the files? I have tried to use your api for uploading and accessing files of STACKPATH object storage. https://docs.openstack.org/api-ref/object-store/index.html?expanded=create-or-replace-object-detail#objects but I got an error of Access Denied. -------------- next part -------------- An HTML attachment was scrubbed... URL: From berrange at redhat.com Tue Jul 14 10:21:29 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 14 Jul 2020 11:21:29 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200713232957.GD5955@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> Message-ID: <20200714102129.GD25187@redhat.com> On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > hi folks, > we are defining a device migration compatibility interface that helps upper > layer stack like openstack/ovirt/libvirt to check if two devices are > live migration compatible. > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > e.g. we could use it to check whether > - a src MDEV can migrate to a target MDEV, > - a src VF in SRIOV can migrate to a target VF in SRIOV, > - a src MDEV can migration to a target VF in SRIOV. > (e.g. SIOV/SRIOV backward compatibility case) > > The upper layer stack could use this interface as the last step to check > if one device is able to migrate to another device before triggering a real > live migration procedure. > we are not sure if this interface is of value or help to you. please don't > hesitate to drop your valuable comments. > > > (1) interface definition > The interface is defined in below way: > > __ userspace > /\ \ > / \write > / read \ > ________/__________ ___\|/_____________ > | migration_version | | migration_version |-->check migration > --------------------- --------------------- compatibility > device A device B > > > a device attribute named migration_version is defined under each device's > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > userspace tools read the migration_version as a string from the source device, > and write it to the migration_version sysfs attribute in the target device. > > The userspace should treat ANY of below conditions as two devices not compatible: > - any one of the two devices does not have a migration_version attribute > - error when reading from migration_version attribute of one device > - error when writing migration_version string of one device to > migration_version attribute of the other device > > The string read from migration_version attribute is defined by device vendor > driver and is completely opaque to the userspace. > for a Intel vGPU, string format can be defined like > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > for an NVMe VF connecting to a remote storage. it could be > "PCI ID" + "driver version" + "configured remote storage URL" > > for a QAT VF, it may be > "PCI ID" + "driver version" + "supported encryption set". > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > (2) backgrounds > > The reason we hope the migration_version string is opaque to the userspace > is that it is hard to generalize standard comparing fields and comparing > methods for different devices from different vendors. > Though userspace now could still do a simple string compare to check if > two devices are compatible, and result should also be right, it's still > too limited as it excludes the possible candidate whose migration_version > string fails to be equal. > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > with another MDEV with mdev_type_3, aggregator count 1, even their > migration_version strings are not equal. > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > besides that, driver version + configured resources are all elements demanding > to take into account. > > So, we hope leaving the freedom to vendor driver and let it make the final decision > in a simple reading from source side and writing for test in the target side way. > > > we then think the device compatibility issues for live migration with assigned > devices can be divided into two steps: > a. management tools filter out possible migration target devices. > Tags could be created according to info from product specification. > we think openstack/ovirt may have vendor proprietary components to create > those customized tags for each product from each vendor. > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > search target vGPU are like: > a tag for compatible parent PCI IDs, > a tag for a range of gvt driver versions, > a tag for a range of mdev type + aggregator count > > for NVMe VF, the tags to search target VF may be like: > a tag for compatible PCI IDs, > a tag for a range of driver versions, > a tag for URL of configured remote storage. Requiring management application developers to figure out this possible compatibility based on prod specs is really unrealistic. Product specs are typically as clear as mud, and with the suggestion we consider different rules for different types of devices, add up to a huge amount of complexity. This isn't something app developers should have to spend their time figuring out. The suggestion that we make use of vendor proprietary helper components is totally unacceptable. We need to be able to build a solution that works with exclusively an open source software stack. IMHO there needs to be a mechanism for the kernel to report via sysfs what versions are supported on a given device. This puts the job of reporting compatible versions directly under the responsibility of the vendor who writes the kernel driver for it. They are the ones with the best knowledge of the hardware they've built and the rules around its compatibility. > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > device migration compatibility interface to make sure the two devices are > indeed live migration compatible before launching the real live migration > process to start stream copying, src device stopping and target device > resuming. > It is supposed that this step would not bring any performance penalty as > -in kernel it's just a simple string decoding and comparing > -in openstack/ovirt, it could be done by extending current function > check_can_live_migrate_destination, along side claiming target resources.[1] > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > Thanks > Yan > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From smooney at redhat.com Tue Jul 14 12:33:24 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 14 Jul 2020 13:33:24 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714102129.GD25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, mdev live migration is completely possible to do but i agree with Dan barrange's comments from the point of view of openstack integration i dont see calling out to a vender sepecific tool to be an accpetable solutions for device compatiablity checking. the sys filesystem that describs the mdevs that can be created shoudl also contain the relevent infomation such taht nova could integrate it via libvirt xml representation or directly retrive the info from sysfs. > > - a src VF in SRIOV can migrate to a target VF in SRIOV, so vf to vf migration is not possible in the general case as there is no standarised way to transfer teh device state as part of the siorv specs produced by the pci-sig as such there is not vender neutral way to support sriov live migration. > > - a src MDEV can migration to a target VF in SRIOV. that also makes this unviable > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. well actully that is already too late really. ideally we would want to do this compaiablity check much sooneer to avoid the migration failing. in an openstack envionment at least by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy finished schduling the instance to the new host. if if we do the compatiablity check at this point and it fails then the live migration is aborted and will not be retired. These types of late check lead to a poor user experince as unless you check the migration detial it basically looks like the migration was ignored as it start to migrate and then continuge running on the orgininal host. when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor id/product id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability of a compatibale device because we track inventories of pci devices and can query that when selecting a host. if we were to support mdev live migration in the future we would want to take the same declarative approch. 1 interospec the capability of the deivce we manage 2 create inventories of the allocatable devices and there capabilities 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces 4 have the lower level hyperviors do addtional validation if need prelive migration. this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that would be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). this might be useful as we could tag the inventory with the migration version and only might to devices with the same version > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. this would not be useful as the schduler cannot directlly connect to the compute host and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally multiple devices per host. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. opaque vendor specific stings that higher level orchestros have to pass form host to host and cant reason about are evil, when allowed they prolifroate and makes any idea of a vendor nutral abstraction and interoperablity between systems impossible to reason about. that said there is a way to make it opaue but still useful to userspace. see below > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) honestly i would much prefer if the version string was just a semver string. e.g. {major}.{minor}.{bugfix} if you do a driver/frimware update and break compatiablity with an older version bump the major version. if you add optional a feature that does not break backwards compatiablity if you migrate an older instance to the new host then just bump the minor/feature number. if you have a fix for a bug that does not change the feature set or compatiblity backwards or forwards then bump the bugfix number then the check is as simple as 1.) is the mdev type the same 2.) is the major verion the same 3.) am i going form the same version to same version or same version to newer version if all 3 are true we can migrate. e.g. 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be incompatable) 2.0.0 -> 3.0.0 (not ok chaning major version) 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay available both to the device driver and to userspace via /sys already we just need to know if version of the same mdev are compatiable so a simple semver version string which is well know in the software world at least is a clean abstration we can reuse. > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. > > Requiring management application developers to figure out this possible > compatibility based on prod specs is really unrealistic. Product specs > are typically as clear as mud, and with the suggestion we consider > different rules for different types of devices, add up to a huge amount > of complexity. This isn't something app developers should have to spend > their time figuring out. > > The suggestion that we make use of vendor proprietary helper components > is totally unacceptable. We need to be able to build a solution that > works with exclusively an open source software stack. > > IMHO there needs to be a mechanism for the kernel to report via sysfs > what versions are supported on a given device. This puts the job of > reporting compatible versions directly under the responsibility of the > vendor who writes the kernel driver for it. They are the ones with the > best knowledge of the hardware they've built and the rules around its > compatibility. yep totally agree with that statement. > > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > > device migration compatibility interface to make sure the two devices are > > indeed live migration compatible before launching the real live migration > > process to start stream copying, src device stopping and target device > > resuming. > > It is supposed that this step would not bring any performance penalty as > > -in kernel it's just a simple string decoding and comparing > > -in openstack/ovirt, it could be done by extending current function > > check_can_live_migrate_destination, along side claiming target resources.[1] that is a compute driver fucntion https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/virt/driver.py#L1261-L1278 that is called in the conductor here https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/conductor/tasks/live_migrate.py#L360-L364 if the check fails(ignoreing the fact its expensive to do an rpc to the compute host) we raise an excption that move on to the next host in the alternate host list. https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/conductor/tasks/live_migrate.py#L556-L567 by default the alternate host list is 3 https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.max_attempts so there would be a pretty high likely hood that if we only checked compatiablity at this point it would fail to migrate. realistically speaking this is too late. we can do a final safty check at this point but this should not be the first time we check compatibility. at a mimnium we would have wanted to select a host with the same mdev type first, we can do that from the info we have today but i hope i have made the point that declaritive interfacs which we can introspect without haveing opaqce vender sepecitic blob are vastly more consomable then imperitive interfaces we have to probe. form a security and packaging point of view this is better too as if i only need readonly access to sysfs instead of write access and if i dont need to package a bunch of addtion vendor tools in a continerised deployment that significantly decreases the potential attack surface. > > > > > > > > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > > > Thanks > > Yan > > > > Regards, > Daniel From ionut at fleio.com Tue Jul 14 14:00:52 2020 From: ionut at fleio.com (Ionut Biru) Date: Tue, 14 Jul 2020 17:00:52 +0300 Subject: [ceilometer][octavia] polling meters In-Reply-To: References: Message-ID: Hi, Thanks for the information. I made it work by using only one attribute at that time the error was something related to the type of an attribute and I got rid of it. On Fri, Jul 10, 2020 at 8:24 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Sure, this is a minimalistic config I used for testing (watch for the > indentation issues that might happen due to copy/paste into Gmail). > >> cat ceilometer/pollsters.d/vpn-connection-dynamic-pollster.yaml >> --- >> >> - name: "dynamic_pollster.network.services.vpn.connection" >> sample_type: "gauge" >> unit: "ipsec_site_connection" >> value_attribute: "status" >> endpoint_type: "network" >> url_path: "v2.0/vpn/ipsec-site-connections" >> metadata_fields: >> - "name" >> - "vpnservice_id" >> - "description" >> - "status" >> - "peer_address" >> value_mapping: >> ACTIVE: "1" >> DOWN: "0" >> metadata_mapping: >> name: "display_name" >> default_value: 0 >> > > Then, the polling.yaml file > > cat ceilometer/polling.yaml | grep -A 3 vpnass >> - name: vpnass_pollsters >> interval: 600 >> meters: >> - dynamic_pollster.network.services.vpn.connection >> > > And last, but not least, the custom_gnocchi_resources file. > >> cat ceilometer/custom_gnocchi_resources.yaml | grep -B 2 -A 9 >> "dynamic_pollster.network.services.vpn.connection" >> - resource_type: s2svpn >> metrics: >> dynamic_pollster.network.services.vpn.connection: >> attributes: >> name: resource_metadata.name >> vpnservice_id: resource_metadata.vpnservice_id >> description: resource_metadata.description >> status: resource_metadata.status >> peer_address: resource_metadata.peer_address >> display_name: resource_metadata.display_name >> > > Bear in mind that you need to create the Gnocchi resource type. > >> gnocchi resource-type show s2svpn >> >> +--------------------------+-----------------------------------------------------------+ >> | Field | Value >> | >> >> +--------------------------+-----------------------------------------------------------+ >> | attributes/description | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/display_name | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/name | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/peer_address | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/status | max_length=255, min_length=0, >> required=False, type=string | >> | attributes/vpnservice_id | required=False, type=uuid >> | >> | name | s2svpn >> | >> | state | active >> | >> >> +--------------------------+-----------------------------------------------------------+ >> > > What is the problem you are having? > > On Fri, Jul 10, 2020 at 10:50 AM Ionut Biru wrote: > >> Hi again, >> >> I did not manage to make it work, I cannot figure out how to connect all >> the pieces. >> >> pollsters.d/octavia.yaml https://paste.xinu.at/DERxh1/ >> pipeline.yaml https://paste.xinu.at/u1E42/ >> polling.yaml https://paste.xinu.at/MZWNs/ >> gnocchi_resources.yaml https://paste.xinu.at/j3AX/ >> gnocchi_client.py in resources_update_operations >> https://paste.xinu.at/no5/ >> gnocchi resource-type show https://paste.xinu.at/7mZIyZ/ >> Do you mind if you do a full example >> using "dynamic.network.services.vpn.connection" from >> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >> ? >> >> Or maybe you can point me to the mistakes made in my configuration? >> >> >> On Tue, Jul 7, 2020 at 2:43 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> That is the right direction. I don't know why people hard-coded the >>> initial pollsters' configs and did not document the relation between >>> Gnocchi and Ceilometer properly. They (Ceilometer and Gnocchi) are not a >>> single system, but interdependent systems to implement a monitoring >>> solution. Ceilometer is the component that gathers data/information, >>> processes, and then persists it somewhere. Gnocchi is one of the options >>> that Ceilometer can use to persist data. By default, Ceilometer creates >>> some basic configurations in Gnocchi to store data, such as some default >>> resource-types with default attributes. However, we do not need (should >>> not) rely on this default config. >>> >>> You can create and use custom resources to fit the stack to your needs. >>> This can be achieved via `gnocchi resource-type create -a >>> :: ` and >>> `gnocchi resource-type create -u >>> :: `. >>> Then, in the `custom_gnocchi_resources.yaml` (if you use Kolla-ansible), >>> you can customize the mapping of metrics to resource-types in Gnocchi. >>> >>> On Tue, Jul 7, 2020 at 7:49 AM Ionut Biru wrote: >>> >>>> Hello again, >>>> >>>> What's the proper way to handle dynamic pollsters in gnocchi ? >>>> Right now ceilometer returns: >>>> >>>> WARNING ceilometer.publisher.gnocchi [-] metric dynamic.network.octavia >>>> is not handled by Gnocchi >>>> >>>> I found >>>> https://docs.openstack.org/ceilometer/latest/contributor/new_resource_types.html >>>> but I'm not sure if is the right direction. >>>> >>>> On Tue, Jul 7, 2020 at 10:52 AM Ionut Biru wrote: >>>> >>>>> Seems to work fine now. Thanks. >>>>> >>>>> On Mon, Jul 6, 2020 at 8:12 PM Rafael Weingärtner < >>>>> rafaelweingartner at gmail.com> wrote: >>>>> >>>>>> It looks like a coding error that we left behind during a major >>>>>> refactoring that we introduced upstream. >>>>>> I created a patch for it. Can you check/review and test it? >>>>>> https://review.opendev.org/739555 >>>>>> >>>>>> On Mon, Jul 6, 2020 at 11:17 AM Ionut Biru wrote: >>>>>> >>>>>>> Hi Rafael, >>>>>>> >>>>>>> I have an error and I cannot resolve it myself. >>>>>>> >>>>>>> https://paste.xinu.at/LEfdXD/ >>>>>>> >>>>>>> Do you happen to know what's wrong? >>>>>>> >>>>>>> endpoint list https://paste.xinu.at/v3j1jl/ >>>>>>> octavia.yaml https://paste.xinu.at/TIxfOz/ >>>>>>> polling.yaml https://paste.xinu.at/oBEFj/ >>>>>>> pipeline.yaml https://paste.xinu.at/qvEdTX/ >>>>>>> >>>>>>> >>>>>>> On Sat, Jul 4, 2020 at 1:10 AM Rafael Weingärtner < >>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>> >>>>>>>> Good catch. I fixed the docs. >>>>>>>> https://review.opendev.org/#/c/739288/ >>>>>>>> >>>>>>>> On Fri, Jul 3, 2020 at 1:59 PM Ionut Biru wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I just noticed that the example >>>>>>>>> dynamic.network.services.vpn.connection from >>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html has >>>>>>>>> the wrong indentation. >>>>>>>>> This https://paste.xinu.at/6PTfsM/ is loaded without any error. >>>>>>>>> >>>>>>>>> Now I have to see why is not polling from it >>>>>>>>> >>>>>>>>> On Fri, Jul 3, 2020 at 7:19 PM Ionut Biru wrote: >>>>>>>>> >>>>>>>>>> Hi Rafael, >>>>>>>>>> >>>>>>>>>> I think I applied all the reviews successfully but I tried to do >>>>>>>>>> an octavia dynamic poller but I have couples of errors. >>>>>>>>>> >>>>>>>>>> Here is the octavia.yaml: https://paste.xinu.at/kDN6SV/ >>>>>>>>>> Error is about syntax error near name: >>>>>>>>>> https://paste.xinu.at/MHgDBY/ >>>>>>>>>> >>>>>>>>>> if i remove the - in front of name like this: >>>>>>>>>> https://paste.xinu.at/K7s5I8/ >>>>>>>>>> The error is different this time: https://paste.xinu.at/zWdC0U/ >>>>>>>>>> >>>>>>>>>> Is there something I missed or is something wrong in yaml? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 2, 2020 at 5:50 PM Rafael Weingärtner < >>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would say so. We are lacking people to review and then merge >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>>> production? >>>>>>>>>>>> >>>>>>>>>>> As long as the person executing the cherry-picks, and >>>>>>>>>>> maintaining the code knows what she/he is doing, you should be safe. The >>>>>>>>>>> guys that are using this implementation (and others that I and my >>>>>>>>>>> colleagues proposed), have a few openstack components that are customized >>>>>>>>>>> with the patches/enhancements/extensions we developed so far; this means, >>>>>>>>>>> they are not using the community version, but something in-between (the >>>>>>>>>>> community releases + the patches we did). Of course, it is only possible, >>>>>>>>>>> because we are the ones creating and maintaining these codes; therefore, we >>>>>>>>>>> can assure quality for production. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 2, 2020 at 9:43 AM Ionut Biru >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Rafael, >>>>>>>>>>>> >>>>>>>>>>>> Since the merging window for ussuri was long passed for those >>>>>>>>>>>> commits, is it safe to assume that it will not land in stable/ussuri at all >>>>>>>>>>>> and those will be available for victoria? >>>>>>>>>>>> >>>>>>>>>>>> How safe is to cherry pick those commits and use them in >>>>>>>>>>>> production? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 24, 2020 at 3:06 PM Rafael Weingärtner < >>>>>>>>>>>> rafaelweingartner at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The dynamic pollster in Ceilometer will be first released in >>>>>>>>>>>>> Ussuri. However, there are some important PRs still waiting for a merge, >>>>>>>>>>>>> that might be important for your use case: >>>>>>>>>>>>> * https://review.opendev.org/#/c/722092/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/715180/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/715289/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/679999/ >>>>>>>>>>>>> * https://review.opendev.org/#/c/709807/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 24, 2020 at 8:18 AM Carlos Goncalves < >>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:20 PM Ionut Biru >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I want to meter the loadbalancer into gnocchi for billing >>>>>>>>>>>>>>> purposes in stein/train and ceilometer doesn't support dynamic pollsters. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think I misunderstood your use case, sorry. I read it as if >>>>>>>>>>>>>> you wanted to know "if a loadbalancer was deployed and has status active". >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Until I upgrade to Ussuri, is there a way to accomplish this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure Ceilometer supports it even in Ussuri. I'll >>>>>>>>>>>>>> defer to the Ceilometer project. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 12:45 PM Carlos Goncalves < >>>>>>>>>>>>>>> cgoncalves at redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Ionut, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Apr 24, 2020 at 11:27 AM Ionut Biru < >>>>>>>>>>>>>>>> ionut at fleio.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello guys, >>>>>>>>>>>>>>>>> I was trying to add in polling.yaml and pipeline from >>>>>>>>>>>>>>>>> ceilometer the following: >>>>>>>>>>>>>>>>> - network.services.lb.active.connections >>>>>>>>>>>>>>>>> - network.services.lb.health_monitor >>>>>>>>>>>>>>>>> - network.services.lb.incoming.bytes >>>>>>>>>>>>>>>>> - network.services.lb.listener >>>>>>>>>>>>>>>>> - network.services.lb.loadbalancer >>>>>>>>>>>>>>>>> - network.services.lb.member >>>>>>>>>>>>>>>>> - network.services.lb.outgoing.bytes >>>>>>>>>>>>>>>>> - network.services.lb.pool >>>>>>>>>>>>>>>>> - network.services.lb.total.connections >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> But it doesn't work, I think they are for the old lbs that >>>>>>>>>>>>>>>>> were supported in neutron. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I found >>>>>>>>>>>>>>>>> https://docs.openstack.org/ceilometer/latest/admin/telemetry-dynamic-pollster.html >>>>>>>>>>>>>>>>> but this is not available in stein or train. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was wondering if there is a way to meter >>>>>>>>>>>>>>>>> loadbalancers from octavia. >>>>>>>>>>>>>>>>> I mostly want for start to just meter if a loadbalancer >>>>>>>>>>>>>>>>> was deployed and has status active. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You can get the provisioning and operating status of >>>>>>>>>>>>>>>> Octavia load balancers via the Octavia API. There is also an API endpoint >>>>>>>>>>>>>>>> that returns the full load balancer status tree [1]. >>>>>>>>>>>>>>>> Additionally, Octavia has three API endpoints for >>>>>>>>>>>>>>>> statistics [2][3][4]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I hope this helps with your use case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Carlos >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-the-load-balancer-status-tree-detail#get-the-load-balancer-status-tree >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-load-balancer-statistics-detail#get-load-balancer-statistics >>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=get-listener-statistics-detail#get-listener-statistics >>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>> https://docs.openstack.org/api-ref/load-balancer/v2/?expanded=show-amphora-statistics-detail#show-amphora-statistics >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Rafael Weingärtner >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ionut Biru - https://fleio.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Rafael Weingärtner >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ionut Biru - https://fleio.com >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Rafael Weingärtner >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ionut Biru - https://fleio.com >>>>> >>>> >>>> >>>> -- >>>> Ionut Biru - https://fleio.com >>>> >>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> Ionut Biru - https://fleio.com >> > > > -- > Rafael Weingärtner > -- Ionut Biru - https://fleio.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jul 14 15:41:02 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 14 Jul 2020 15:41:02 +0000 Subject: Cloud Computing Resource In-Reply-To: References: Message-ID: <20200714154102.d5zso2liey6ztmzf@yuggoth.org> On 2020-07-14 13:23:20 +0530 (+0530), supriya katal wrote: > I have checked your sites. > > https://github.com/openstacknetsdk/openstack.net/wiki/Getting-Started-With-The-OpenStack-NET-SDK [...] Contrary to its name, that does not appear to have been created by the OpenStack community. Their documentation indicates you should contact sdk-support at rackspace.com with any questions. It also looks like the most recent release for it was 4 years ago, and the most recent commit to merge in their default Git branch was from two years ago, so I would not be surprised if they're no longer actively maintaining it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From whayutin at redhat.com Tue Jul 14 15:50:48 2020 From: whayutin at redhat.com (Wesley Hayutin) Date: Tue, 14 Jul 2020 09:50:48 -0600 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: On Tue, Jul 14, 2020 at 9:11 AM Bogdan Dobrelya wrote: > On 7/14/20 3:30 PM, Emilien Macchi wrote: > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > over the > > last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources > or > > improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > Python code > > into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to > > the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > well as a > > lot of bug fixes or new features; which contributed to make TripleO more > stable > > and scalable. I would like to propose him be part of the TripleO core > team. > > > > Thanks Rabi for your hard work! > > +1 > > > -- > > Emilien Macchi > > Thanks for raising this Emilien!! Thank you to Rabi for your excellent work! +1 > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.williamson at redhat.com Tue Jul 14 16:16:16 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 10:16:16 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714102129.GD25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: <20200714101616.5d3a9e75@x1.home> On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > - a src MDEV can migration to a target VF in SRIOV. > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) It's very strange to define it as opaque and then proceed to describe the contents of that opaque string. The point is that its contents are defined by the vendor driver to describe the device, driver version, and possibly metadata about the configuration of the device. One instance of a device might generate a different string from another. The string that a device produces is not necessarily the only string the vendor driver will accept, for example the driver might support backwards compatible migrations. > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. I interpret this as hand waving, ie. the first step is for management tools to make a good guess :-\ We don't seem to be willing to say that a given mdev type can only migrate to a device with that same type. There's this aggregation discussion happening separately where a base mdev type might be created or later configured to be equivalent to a different type. The vfio migration API we've defined is also not limited to mdev devices, for example we could create vendor specific quirks or hooks to provide migration support for a physical PF/VF device. Within the realm of possibility then is that we could migrate between a physical device and an mdev device, which are simply different degrees of creating a virtualization layer in front of the device. > Requiring management application developers to figure out this possible > compatibility based on prod specs is really unrealistic. Product specs > are typically as clear as mud, and with the suggestion we consider > different rules for different types of devices, add up to a huge amount > of complexity. This isn't something app developers should have to spend > their time figuring out. Agreed. > The suggestion that we make use of vendor proprietary helper components > is totally unacceptable. We need to be able to build a solution that > works with exclusively an open source software stack. I'm surprised to see this as well, but I'm not sure if Yan was really suggesting proprietary software so much as just vendor specific knowledge. > IMHO there needs to be a mechanism for the kernel to report via sysfs > what versions are supported on a given device. This puts the job of > reporting compatible versions directly under the responsibility of the > vendor who writes the kernel driver for it. They are the ones with the > best knowledge of the hardware they've built and the rules around its > compatibility. The version string discussed previously is the version string that represents a given device, possibly including driver information, configuration, etc. I think what you're asking for here is an enumeration of every possible version string that a given device could accept as an incoming migration stream. If we consider the string as opaque, that means the vendor driver needs to generate a separate string for every possible version it could accept, for every possible configuration option. That potentially becomes an excessive amount of data to either generate or manage. Am I overestimating how vendors intend to use the version string? We'd also need to consider devices that we could create, for instance providing the same interface enumeration prior to creating an mdev device to have a confidence level that the new device would be a valid target. We defined the string as opaque to allow vendor flexibility and because defining a common format is hard. Do we need to revisit this part of the discussion to define the version string as non-opaque with parsing rules, probably with separate incoming vs outgoing interfaces? Thanks, Alex From dev.faz at gmail.com Tue Jul 14 16:44:05 2020 From: dev.faz at gmail.com (Fabian Zimmermann) Date: Tue, 14 Jul 2020 18:44:05 +0200 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi, Am Di., 14. Juli 2020 um 02:04 Uhr schrieb Michael Johnson < johnsomor at gmail.com>: > Sorry you have run into trouble and we have missed you in the IRC channel. > Thanks for your great work and support! > Yeah, that transcript from three years ago isn't going to be much help. > Arg. > A few things we will want to know are: > 1. What version of Octavia are you using? > 3.1.0 > 2. Do you have the DNS extension to neutron enabled? > yes > 3. When it said "unable to attach port to amphora", can you provide > the full error? Was it due to a hostname mismatch error from nova? > arg, debug logs got already rotated. I will repeat my debug-session and paste the output. Any suggestions what I should do? Maybe I can already try something different? My guess is you ran into the issue where a port will not attach if the > DNS name doesn't match. Our workaround for that accidentally got > removed and re-added in https://review.opendev.org/#/c/663277/. > So, this should already be fixed in stable/rocky. Should upgrading octavia to latest stable/rocky be enough to get my amphoras working again? Replacing a vrrp_port is tricky, so I'm not surprised you ran into > some trouble. Can you please provide the controller worker log output > when doing a load balancer failover (let's not use amphora failover > here) on paste.openstack.org? You can mark it private and directly > reply to me if you have concerns about the log content. > Will provide this asap. > All this said, I have recently completely refactored the failover > flows recently. This has already merged on the master branch and > backports are in process. > Thanks a lot, Fabian -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Tue Jul 14 16:40:11 2020 From: thomas.king at gmail.com (Thomas King) Date: Tue, 14 Jul 2020 10:40:11 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: I have. That's the Triple-O docs and they don't go through the normal .conf files to explain how it works outside of Triple-O. It has some ideas but no running configurations. Tom King On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis wrote: > hi, have you checked: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html > ? > I am following this link. I only have one network, having different issues > tho ;) > > > > On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: > >> Thank you, Amy! >> >> Tom >> >> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >> >>> Hey Tom, >>> >>> Adding the OpenStack discuss list as I think you got several replies >>> from there as well. >>> >>> Thanks, >>> >>> Amy (spotz) >>> >>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>> wrote: >>> >>>> Good day, >>>> >>>> I'm bringing up a thread from June about DHCP relay with neutron >>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>> not have the plain config/neutron config to show how a regular Ironic setup >>>> would use DHCP relay. >>>> >>>> The Neutron segments docs state that I must have a unique physical >>>> network name. If my Ironic controller has a single provisioning network >>>> with a single physical network name, doesn't this prevent my use of >>>> multiple segments? >>>> >>>> Further, the segments docs state this: "The operator must ensure that >>>> every compute host that is supposed to participate in a router provider >>>> network has direct connectivity to one of its segments." (section 3 at >>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>> current docs state the same thing) >>>> This defeats the purpose of using DHCP relay, though, where the Ironic >>>> controller does *not* have direct connectivity to the remote segment. >>>> >>>> Here is a rough drawing - what is wrong with my thinking here? >>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP relay >>>> <------> Ironic controller, provisioning network: 10.146.29.192/26 >>>> VLAN 2115 >>>> >>>> Thank you, >>>> Tom King >>>> _______________________________________________ >>>> openstack-mentoring mailing list >>>> openstack-mentoring at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>> >>> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From berrange at redhat.com Tue Jul 14 16:47:22 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Tue, 14 Jul 2020 17:47:22 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: <20200714164722.GL25187@redhat.com> On Tue, Jul 14, 2020 at 10:16:16AM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. > > Am I overestimating how vendors intend to use the version string? If I'm interpreting your reply & the quoted text orrectly, the version string isn't really a version string in any normal sense of the word "version". Instead it sounds like string encoding a set of features in some arbitrary vendor specific format, which they parse and do compatibility checks on individual pieces ? One or more parts may contain a version number, but its much more than just a version. If that's correct, then I'd prefer we didn't call it a version string, instead call it a "capability string" to make it clear it is expressing a much more general concept, but... > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, ..even if the huge amount of flexibility is technically relevant from the POV of the hardware/drivers, we should consider whether management apps actually want, or can use, that level of flexibility. The task of picking which host to place a VM on has alot of factors to consider, and when there are a large number of hosts, the total amount of information to check gets correspondingly large. The placement process is also fairly performance critical. Running complex algorithmic logic to check compatibility of devices based on a arbitrary set of rules is likely to be a performance challenge. A flat list of supported strings is a much simpler thing to check as it reduces down to a simple set membership test. IOW, even if there's some complex set of device type / vendor specific rules to check for compatibility, I fear apps will ignore them and just define a very simplified list of compatible string, and ignore all the extra flexibility. I'm sure OpenStack maintainers can speak to this more, as they've put alot of work into their scheduling engine to optimize the way it places VMs largely driven from simple structured data reported from hosts. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From alex.williamson at redhat.com Tue Jul 14 17:01:48 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 11:01:48 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> Message-ID: <20200714110148.0471c03c@x1.home> On Tue, 14 Jul 2020 13:33:24 +0100 Sean Mooney wrote: > On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > mdev live migration is completely possible to do but i agree with Dan barrange's comments > from the point of view of openstack integration i dont see calling out to a vender sepecific > tool to be an accpetable As I replied to Dan, I'm hoping Yan was referring more to vendor specific knowledge rather than actual tools. > solutions for device compatiablity checking. the sys filesystem > that describs the mdevs that can be created shoudl also > contain the relevent infomation such > taht nova could integrate it via libvirt xml representation or directly retrive the > info from > sysfs. > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > so vf to vf migration is not possible in the general case as there is no standarised > way to transfer teh device state as part of the siorv specs produced by the pci-sig > as such there is not vender neutral way to support sriov live migration. We're not talking about a general case, we're talking about physical devices which have vfio wrappers or hooks with device specific knowledge in order to support the vfio migration interface. The point is that a discussion around vfio device migration cannot be limited to mdev devices. > > > - a src MDEV can migration to a target VF in SRIOV. > that also makes this unviable > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > well actully that is already too late really. ideally we would want to do this compaiablity > check much sooneer to avoid the migration failing. in an openstack envionment at least > by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy > finished schduling the instance to the new host. if if we do the compatiablity check at this point > and it fails then the live migration is aborted and will not be retired. These types of late check lead to a > poor user experince as unless you check the migration detial it basically looks like the migration was ignored > as it start to migrate and then continuge running on the orgininal host. > > when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor id/product > id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability of a > compatibale device because we track inventories of pci devices and can query that when selecting a host. > > if we were to support mdev live migration in the future we would want to take the same declarative approch. > 1 interospec the capability of the deivce we manage > 2 create inventories of the allocatable devices and there capabilities > 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces > 4 have the lower level hyperviors do addtional validation if need prelive migration. > > this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that would > be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. I think this is reading a whole lot into the phrase "last step". We want to make the information available for a management engine to consume as needed to make informed decisions regarding likely compatible target devices. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > this might be useful as we could tag the inventory with the migration version and only might to > devices with the same version Is cross version compatibility something that you'd consider using? > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > this would not be useful as the schduler cannot directlly connect to the compute host > and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally > multiple devices per host. Seems similar to Dan's requirement, looks like the 'read for version, write for compatibility' test idea isn't really viable. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > opaque vendor specific stings that higher level orchestros have to pass form host > to host and cant reason about are evil, when allowed they prolifroate and > makes any idea of a vendor nutral abstraction and interoperablity between systems > impossible to reason about. that said there is a way to make it opaue but still useful > to userspace. see below > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > honestly i would much prefer if the version string was just a semver string. > e.g. {major}.{minor}.{bugfix} > > if you do a driver/frimware update and break compatiablity with an older version bump the > major version. > > if you add optional a feature that does not break backwards compatiablity if you migrate > an older instance to the new host then just bump the minor/feature number. > > if you have a fix for a bug that does not change the feature set or compatiblity backwards or > forwards then bump the bugfix number > > then the check is as simple as > 1.) is the mdev type the same > 2.) is the major verion the same > 3.) am i going form the same version to same version or same version to newer version > > if all 3 are true we can migrate. > e.g. > 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) > 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be > incompatable) > 2.0.0 -> 3.0.0 (not ok chaning major version) > 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) What's the value of the bugfix field in this scheme? The simplicity is good, but is it too simple. It's not immediately clear to me whether all features can be hidden behind a minor version. For instance, if we have an mdev device that supports this notion of aggregation, which is proposed as a solution to the problem that physical hardware might support lots and lots of assignable interfaces which can be combined into arbitrary sets for mdev devices, making it impractical to expose an mdev type for every possible enumeration of assignable interfaces within a device. We therefore expose a base type where the aggregation is built later. This essentially puts us in a scenario where even within an mdev type running on the same driver, there are devices that are not directly compatible with each other. > we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay > available both to the device driver and to userspace via /sys already we just need to know if version of > the same mdev are compatiable so a simple semver version string which is well know in the software world > at least is a clean abstration we can reuse. This presumes there's no cross device migration. An mdev type can only be migrated to the same mdev type, all of the devices within that type have some based compatibility, a phsyical device can only be migrated to the same physical device. In the latter case what defines the type? If it's a PCI device, is it only vendor:device IDs? What about revision? What about subsystem IDs? What about possibly an onboard ROM or internal firmware? The information may be available, but which things are relevant to migration? We already see desires to allow migration between physical and mdev, but also to expose mdev types that might be composable to be compatible with other types. Thanks, Alex From sgolovat at redhat.com Tue Jul 14 17:03:55 2020 From: sgolovat at redhat.com (Sergii Golovatiuk) Date: Tue, 14 Jul 2020 19:03:55 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Hi, +1. Thank you Rabi! вт, 14 июл. 2020 г. в 17:53, Wesley Hayutin : > > > On Tue, Jul 14, 2020 at 9:11 AM Bogdan Dobrelya > wrote: > >> On 7/14/20 3:30 PM, Emilien Macchi wrote: >> > Hi folks, >> > >> > Rabi has proved deep technical understanding on the TripleO components >> over the >> > last years. >> > Initially as a major maintainer of the Heat project and then a regular >> > contributor to TripleO, he got involved at different levels: >> > - Optimization of the Heat templates, to reduce the number of resources >> or >> > improve them to make it faster and more efficient at scale. >> > - Migration of the Mistral workflows into native Ansible modules and >> Python code >> > into tripleo-common, with end-to-end expertise. >> > - Regular contributions to the container tooling integration. >> > >> > Being involved on the mailing-list and IRC channels, Rabi is always >> helpful to >> > the community and here to help. >> > He has provided thorough reviews in principal components on TripleO as >> well as a >> > lot of bug fixes or new features; which contributed to make TripleO >> more stable >> > and scalable. I would like to propose him be part of the TripleO core >> team. >> > >> > Thanks Rabi for your hard work! >> >> +1 >> >> > -- >> > Emilien Macchi >> >> Thanks for raising this Emilien!! Thank you to Rabi for your excellent > work! > +1 > >> >> -- >> Best regards, >> Bogdan Dobrelya, >> Irc #bogdando >> >> >> -- Sergii Golovatiuk Senior Software Developer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgilbert at redhat.com Tue Jul 14 17:19:46 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Tue, 14 Jul 2020 18:19:46 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: <20200714171946.GL2728@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. (As I've said in the previous discussion, off one of the patch series) My view is it makes sense to have a half-way house on the opaqueness of this string; I'd expect to have an ID and version that are human readable, maybe a device ID/name that's human interpretable and then a bunch of other cruft that maybe device/vendor/version specific. I'm thinking that we want to be able to report problems and include the string and the user to be able to easily identify the device that was complaining and notice a difference in versions, and perhaps also use it in compatibility patterns to find compatible hosts; but that does get tricky when it's a 'ask the device if it's compatible'. Dave > > > (2) backgrounds > > > > > > The reason we hope the migration_version string is opaque to the userspace > > > is that it is hard to generalize standard comparing fields and comparing > > > methods for different devices from different vendors. > > > Though userspace now could still do a simple string compare to check if > > > two devices are compatible, and result should also be right, it's still > > > too limited as it excludes the possible candidate whose migration_version > > > string fails to be equal. > > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > > with another MDEV with mdev_type_3, aggregator count 1, even their > > > migration_version strings are not equal. > > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > > > besides that, driver version + configured resources are all elements demanding > > > to take into account. > > > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > > in a simple reading from source side and writing for test in the target side way. > > > > > > > > > we then think the device compatibility issues for live migration with assigned > > > devices can be divided into two steps: > > > a. management tools filter out possible migration target devices. > > > Tags could be created according to info from product specification. > > > we think openstack/ovirt may have vendor proprietary components to create > > > those customized tags for each product from each vendor. > > > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > > search target vGPU are like: > > > a tag for compatible parent PCI IDs, > > > a tag for a range of gvt driver versions, > > > a tag for a range of mdev type + aggregator count > > > > > > for NVMe VF, the tags to search target VF may be like: > > > a tag for compatible PCI IDs, > > > a tag for a range of driver versions, > > > a tag for URL of configured remote storage. > > I interpret this as hand waving, ie. the first step is for management > tools to make a good guess :-\ We don't seem to be willing to say that > a given mdev type can only migrate to a device with that same type. > There's this aggregation discussion happening separately where a base > mdev type might be created or later configured to be equivalent to a > different type. The vfio migration API we've defined is also not > limited to mdev devices, for example we could create vendor specific > quirks or hooks to provide migration support for a physical PF/VF > device. Within the realm of possibility then is that we could migrate > between a physical device and an mdev device, which are simply > different degrees of creating a virtualization layer in front of the > device. > > > Requiring management application developers to figure out this possible > > compatibility based on prod specs is really unrealistic. Product specs > > are typically as clear as mud, and with the suggestion we consider > > different rules for different types of devices, add up to a huge amount > > of complexity. This isn't something app developers should have to spend > > their time figuring out. > > Agreed. > > > The suggestion that we make use of vendor proprietary helper components > > is totally unacceptable. We need to be able to build a solution that > > works with exclusively an open source software stack. > > I'm surprised to see this as well, but I'm not sure if Yan was really > suggesting proprietary software so much as just vendor specific > knowledge. > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. > > Am I overestimating how vendors intend to use the version string? > > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From johnsomor at gmail.com Tue Jul 14 18:02:10 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 14 Jul 2020 11:02:10 -0700 Subject: [octavia] Replace broken amphoras In-Reply-To: References: Message-ID: Hi again, So looking at the patch in question, yes, upgrading to the latest version of Octavia for Rocky, 3.2.2 will resolve the DNS issue going forward. It was originally included in the 3.2.0 release for Rocky, but we would recommend updating to the latest Rocky release, 3.2.2. Michael On Tue, Jul 14, 2020 at 9:44 AM Fabian Zimmermann wrote: > > Hi, > > Am Di., 14. Juli 2020 um 02:04 Uhr schrieb Michael Johnson : >> >> Sorry you have run into trouble and we have missed you in the IRC channel. > > Thanks for your great work and support! > >> >> Yeah, that transcript from three years ago isn't going to be much help. > > Arg. > >> >> A few things we will want to know are: >> 1. What version of Octavia are you using? > > > 3.1.0 > >> >> 2. Do you have the DNS extension to neutron enabled? > > > yes > >> >> 3. When it said "unable to attach port to amphora", can you provide >> the full error? Was it due to a hostname mismatch error from nova? > > > arg, debug logs got already rotated. I will repeat my debug-session and paste the output. > > Any suggestions what I should do? Maybe I can already try something different? > >> My guess is you ran into the issue where a port will not attach if the >> DNS name doesn't match. Our workaround for that accidentally got >> removed and re-added in https://review.opendev.org/#/c/663277/. > > > So, this should already be fixed in stable/rocky. Should upgrading octavia to latest stable/rocky be enough to get my amphoras working again? > >> Replacing a vrrp_port is tricky, so I'm not surprised you ran into >> some trouble. Can you please provide the controller worker log output >> when doing a load balancer failover (let's not use amphora failover >> here) on paste.openstack.org? You can mark it private and directly >> reply to me if you have concerns about the log content. > > > Will provide this asap. > >> >> All this said, I have recently completely refactored the failover >> flows recently. This has already merged on the master branch and >> backports are in process. > > > Thanks a lot, > > Fabian From alex.williamson at redhat.com Tue Jul 14 20:47:15 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 14:47:15 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714164722.GL25187@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714164722.GL25187@redhat.com> Message-ID: <20200714144715.0ef70074@x1.home> On Tue, 14 Jul 2020 17:47:22 +0100 Daniel P. Berrangé wrote: > On Tue, Jul 14, 2020 at 10:16:16AM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 11:21:29 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > driver and is completely opaque to the userspace. > > > > for a Intel vGPU, string format can be defined like > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > for a QAT VF, it may be > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > It's very strange to define it as opaque and then proceed to describe > > the contents of that opaque string. The point is that its contents > > are defined by the vendor driver to describe the device, driver version, > > and possibly metadata about the configuration of the device. One > > instance of a device might generate a different string from another. > > The string that a device produces is not necessarily the only string > > the vendor driver will accept, for example the driver might support > > backwards compatible migrations. > > > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > > what versions are supported on a given device. This puts the job of > > > reporting compatible versions directly under the responsibility of the > > > vendor who writes the kernel driver for it. They are the ones with the > > > best knowledge of the hardware they've built and the rules around its > > > compatibility. > > > > The version string discussed previously is the version string that > > represents a given device, possibly including driver information, > > configuration, etc. I think what you're asking for here is an > > enumeration of every possible version string that a given device could > > accept as an incoming migration stream. If we consider the string as > > opaque, that means the vendor driver needs to generate a separate > > string for every possible version it could accept, for every possible > > configuration option. That potentially becomes an excessive amount of > > data to either generate or manage. > > > > Am I overestimating how vendors intend to use the version string? > > If I'm interpreting your reply & the quoted text orrectly, the version > string isn't really a version string in any normal sense of the word > "version". > > Instead it sounds like string encoding a set of features in some arbitrary > vendor specific format, which they parse and do compatibility checks on > individual pieces ? One or more parts may contain a version number, but > its much more than just a version. > > If that's correct, then I'd prefer we didn't call it a version string, > instead call it a "capability string" to make it clear it is expressing > a much more general concept, but... I'd agree with that. The intent of the previous proposal was to provide and interface for reading a string and writing a string back in where the result of that write indicated migration compatibility with the device. So yes, "version" is not the right term. > > We'd also need to consider devices that we could create, for instance > > providing the same interface enumeration prior to creating an mdev > > device to have a confidence level that the new device would be a valid > > target. > > > > We defined the string as opaque to allow vendor flexibility and because > > defining a common format is hard. Do we need to revisit this part of > > the discussion to define the version string as non-opaque with parsing > > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > ..even if the huge amount of flexibility is technically relevant from the > POV of the hardware/drivers, we should consider whether management apps > actually want, or can use, that level of flexibility. > > The task of picking which host to place a VM on has alot of factors to > consider, and when there are a large number of hosts, the total amount > of information to check gets correspondingly large. The placement > process is also fairly performance critical. > > Running complex algorithmic logic to check compatibility of devices > based on a arbitrary set of rules is likely to be a performance > challenge. A flat list of supported strings is a much simpler > thing to check as it reduces down to a simple set membership test. > > IOW, even if there's some complex set of device type / vendor specific > rules to check for compatibility, I fear apps will ignore them and > just define a very simplified list of compatible string, and ignore > all the extra flexibility. There's always the "try it and see if it works" interface, which is essentially what we have currently. With even a simple version of what we're trying to accomplish here, there's still a risk that a management engine might rather just ignore it and restrict themselves to 1:1 mdev type matches, with or without knowing anything about the vendor driver version, relying on the migration to fail quickly if the devices are incompatible. If the complexity of the interface makes it too complicated or time consuming to provide sufficient value above such an algorithm, there's not much point to implementing it, which is why Yan has included so many people in this discussion. > I'm sure OpenStack maintainers can speak to this more, as they've put > alot of work into their scheduling engine to optimize the way it places > VMs largely driven from simple structured data reported from hosts. I think we've weeded out that our intended approach is not worthwhile, testing a compatibility string at a device is too much overhead, we need to provide enough information to the management engine to predict the response without interaction beyond the initial capability probing. As you've identified above, we're really dealing with more than a simple version, we need to construct a compatibility string and we need to start defining what goes into that. The first item seems to be that we're defining compatibility relative to a vfio migration stream, vfio devices have a device API, such as vfio-pci, so the first attribute might simply define the device API. Once we have a class of devices we might then be able to use bus specific attributes, for example the PCI vendor and device ID (other bus types TBD). We probably also need driver version numbers, so we need to include both the driver name as well as version major and minor numbers. Rules need to be put in place around what we consider to be viable version matches, potentially as Sean described. For example, does the major version require a match? Do we restrict to only formward, ie. increasing, minor number matches within that major verison? Do we then also have section that includes any required device attributes to result in a compatible device. This would be largely focused on mdev, but I wouldn't rule out others. For example if an aggregation parameter is required to maintain compatibility, we'd want to specify that as a required attribute. So maybe we end up with something like: { "device_api": "vfio-pci", "vendor": "vendor-driver-name", "version": { "major": 0, "minor": 1 }, "vfio-pci": { // Based on above device_api "vendor": 0x1234, // Values for the exposed device "device": 0x5678, // Possibly further parameters for a more specific match } "mdev_attrs": [ { "attribute0": "VALUE" } ] } The sysfs interface would return an array containing one or more of these for each device supported. I'm trying to account for things like aggregation via the mdev_attrs section, but I haven't really put it all together yet. I think Intel folks want to be able to say mdev type foo-3 is compatible with mdev type foo-1 so long as foo-1 is created with an aggregation attribute value of 3, but I expect both foo-1 and foo-3 would have the same user visible PCI vendor:device IDs If we use mdev type rather than the resulting device IDs, then we introduce an barrier to phys<->mdev migration. We could specify the subsystem values though, for example foo-1 might correspond to subsystem IDs 8086:0001 and foo3 8086:0003, then we can specify that creating an foo-1 from this device doesn't require any attributes, but creating a foo-3 does. I'm nervous how that scales though. NB. I'm also considering how portions of this might be compatible with mdevctl such that we could direct mdevctl to create a compatible device using information from this compatibility interface. Thanks, Alex From kevin at cloudnull.com Wed Jul 15 03:52:22 2020 From: kevin at cloudnull.com (Carter, Kevin) Date: Tue, 14 Jul 2020 22:52:22 -0500 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Absolutely, +1 On Tue, Jul 14, 2020 at 08:35 Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > > -- > Emilien Macchi > -- Kevin Carter IRC: Cloudnull -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Wed Jul 15 06:26:13 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 15 Jul 2020 08:26:13 +0200 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: I have deployed that with tripleO, but now we are recabling and redeploying it. So once I have it running I can share my configs, just name which you want :) On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: > I have. That's the Triple-O docs and they don't go through the normal > .conf files to explain how it works outside of Triple-O. It has some ideas > but no running configurations. > > Tom King > > On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis > wrote: > >> hi, have you checked: >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >> ? >> I am following this link. I only have one network, having different >> issues tho ;) >> >> >> >> On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: >> >>> Thank you, Amy! >>> >>> Tom >>> >>> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >>> >>>> Hey Tom, >>>> >>>> Adding the OpenStack discuss list as I think you got several replies >>>> from there as well. >>>> >>>> Thanks, >>>> >>>> Amy (spotz) >>>> >>>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>>> wrote: >>>> >>>>> Good day, >>>>> >>>>> I'm bringing up a thread from June about DHCP relay with neutron >>>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>>> not have the plain config/neutron config to show how a regular Ironic setup >>>>> would use DHCP relay. >>>>> >>>>> The Neutron segments docs state that I must have a unique physical >>>>> network name. If my Ironic controller has a single provisioning network >>>>> with a single physical network name, doesn't this prevent my use of >>>>> multiple segments? >>>>> >>>>> Further, the segments docs state this: "The operator must ensure that >>>>> every compute host that is supposed to participate in a router provider >>>>> network has direct connectivity to one of its segments." (section 3 at >>>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>>> current docs state the same thing) >>>>> This defeats the purpose of using DHCP relay, though, where the Ironic >>>>> controller does *not* have direct connectivity to the remote segment. >>>>> >>>>> Here is a rough drawing - what is wrong with my thinking here? >>>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP >>>>> relay <------> Ironic controller, provisioning network: >>>>> 10.146.29.192/26 VLAN 2115 >>>>> >>>>> Thank you, >>>>> Tom King >>>>> _______________________________________________ >>>>> openstack-mentoring mailing list >>>>> openstack-mentoring at lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>>> >>>> >> >> -- >> Ruslanas Gžibovskis >> +370 6030 7030 >> > -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michele at acksyn.org Wed Jul 15 06:28:59 2020 From: michele at acksyn.org (Michele Baldessari) Date: Wed, 15 Jul 2020 08:28:59 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: <20200715062859.GA2712@holtby.localdomain> +1 On Tue, Jul 14, 2020 at 10:52:22PM -0500, Carter, Kevin wrote: > Absolutely, +1 > > On Tue, Jul 14, 2020 at 08:35 Emilien Macchi wrote: > > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > > over the last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources or > > improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > > Python code into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > > helpful to the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > > well as a lot of bug fixes or new features; which contributed to make > > TripleO more stable and scalable. I would like to propose him be part of > > the TripleO core team. > > > > Thanks Rabi for your hard work! > > > > -- > > Emilien Macchi > > > -- > Kevin Carter > IRC: Cloudnull -- Michele Baldessari C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D From cjeanner at redhat.com Wed Jul 15 11:01:47 2020 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 15 Jul 2020 13:01:47 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: Of course +1! On 7/14/20 3:30 PM, Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources > or improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi -- Cédric Jeanneret (He/Him/His) Sr. Software Engineer - OpenStack Platform Deployment Framework TC Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From johfulto at redhat.com Wed Jul 15 12:00:40 2020 From: johfulto at redhat.com (John Fulton) Date: Wed, 15 Jul 2020 08:00:40 -0400 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 I thought he was already a core. On Wed, Jul 15, 2020 at 7:05 AM Cédric Jeanneret wrote: > Of course +1! > > On 7/14/20 3:30 PM, Emilien Macchi wrote: > > Hi folks, > > > > Rabi has proved deep technical understanding on the TripleO components > > over the last years. > > Initially as a major maintainer of the Heat project and then a regular > > contributor to TripleO, he got involved at different levels: > > - Optimization of the Heat templates, to reduce the number of resources > > or improve them to make it faster and more efficient at scale. > > - Migration of the Mistral workflows into native Ansible modules and > > Python code into tripleo-common, with end-to-end expertise. > > - Regular contributions to the container tooling integration. > > > > Being involved on the mailing-list and IRC channels, Rabi is always > > helpful to the community and here to help. > > He has provided thorough reviews in principal components on TripleO as > > well as a lot of bug fixes or new features; which contributed to make > > TripleO more stable and scalable. I would like to propose him be part of > > the TripleO core team. > > > > Thanks Rabi for your hard work! > > -- > > Emilien Macchi > > -- > Cédric Jeanneret (He/Him/His) > Sr. Software Engineer - OpenStack Platform > Deployment Framework TC > Red Hat EMEA > https://www.redhat.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Wed Jul 15 12:21:28 2020 From: zigo at debian.org (Thomas Goirand) Date: Wed, 15 Jul 2020 14:21:28 +0200 Subject: Floating IP's for routed networks In-Reply-To: <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: Sending the message again with the correct From, as I'm not subscribed to the list with the other mailbox. On 7/15/20 2:13 PM, Thomas Goirand wrote: > Hi Ryan, > > If you don't mind, I'm adding the openstack-discuss list in the loop, as > this topic may be of interest to others. > > For mailing list readers, I'm trying to implement this: > https://review.opendev.org/#/c/669395/ > but I'm having some difficulties. > > I did a bit of investigation with some added LOG.info() in the code. > > When doing: > >> openstack subnet create vm-fip \ >> --subnet-range 10.66.20.0/24 \ >> --service-type 'network:routed' \ >> --service-type 'network:floatingip' \ >> --network multisegment1 > > Here's where neutron-api crashes. in db/ipam_backend_mixin.py: > > def _validate_segment(self, context, network_id, segment_id, > action=None, > old_segment_id=None): > # TODO(tidwellr) Create and use a constant for the service type > segments = subnet_obj.Subnet.get_subnet_segment_ids( > context, network_id, filtered_service_type='network:routed') > > associated_segments = set(segments) > if None in associated_segments and len(associated_segments) > 1: > raise segment_exc.SubnetsNotAllAssociatedWithSegments( > network_id=network_id) > > SubnetsNotAllAssociatedWithSegments() is raised, as you must already > guessed. Here's the values... > > associated_segments is an array containing 3 values: 2 being the IDs of > the segments I added previously, the 3rd one being None. This test is > then matched. Where is that None value coming from? Is this the new > subnet I'm trying to add? Maybe the > filtered_service_type='network:routed' in the call: > subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? > > Printing the SQL query that is checked shows: > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN > (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id > FROM subnet_service_types > WHERE subnets.network_id = %(network_id_2)s AND > subnet_service_types.subnet_id = subnets.id AND > subnet_service_types.service_type = %(service_type_1)s) > > though when doing by hand: > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > the db has only 2 subnets, so it looks like the floating-ip subnet got > added before the check, and is then removed when the above test fails. > > So I just removed the raise, and could add the subnet I wanted, but > that's obviously not a long term solution. > > Your thoughts? > > Another problem that I'm having, is that neutron-bgp-dragent is not > receiving (or processing) the messages from neutron-rpc-server. I've > enabled DEBUG mode for oslo_messaging, and found out that when dr-agent > starts and prints "Agent has just been revived. Scheduling full sync", > it does send a message to neutron-rpc-server, which is replied, but it > doesn't look like dr-agent processes the return message in its reply > queue, and then prints in the logs: "imeout in RPC method > get_bgp_speakers. Waiting for 17 seconds before next attempt. If the > server is not down, consider increasing the rpc_response_timeout option > as Neutron server(s) may be overloaded and unable to respond quickly > enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting > for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is > weird is that a few times (rarely), it worked, and the agent gets the reply. > > What should I do to investigate further? > > Cheers, > > Thomas Goirand (zigo) > From alex.williamson at redhat.com Tue Jul 14 20:59:48 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Tue, 14 Jul 2020 14:59:48 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714171946.GL2728@work-vm> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> Message-ID: <20200714145948.17b95eb3@x1.home> On Tue, 14 Jul 2020 18:19:46 +0100 "Dr. David Alan Gilbert" wrote: > * Alex Williamson (alex.williamson at redhat.com) wrote: > > On Tue, 14 Jul 2020 11:21:29 +0100 > > Daniel P. Berrangé wrote: > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > hi folks, > > > > we are defining a device migration compatibility interface that helps upper > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > live migration compatible. > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > e.g. we could use it to check whether > > > > - a src MDEV can migrate to a target MDEV, > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > if one device is able to migrate to another device before triggering a real > > > > live migration procedure. > > > > we are not sure if this interface is of value or help to you. please don't > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > (1) interface definition > > > > The interface is defined in below way: > > > > > > > > __ userspace > > > > /\ \ > > > > / \write > > > > / read \ > > > > ________/__________ ___\|/_____________ > > > > | migration_version | | migration_version |-->check migration > > > > --------------------- --------------------- compatibility > > > > device A device B > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > userspace tools read the migration_version as a string from the source device, > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > - any one of the two devices does not have a migration_version attribute > > > > - error when reading from migration_version attribute of one device > > > > - error when writing migration_version string of one device to > > > > migration_version attribute of the other device > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > driver and is completely opaque to the userspace. > > > > for a Intel vGPU, string format can be defined like > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > for a QAT VF, it may be > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > It's very strange to define it as opaque and then proceed to describe > > the contents of that opaque string. The point is that its contents > > are defined by the vendor driver to describe the device, driver version, > > and possibly metadata about the configuration of the device. One > > instance of a device might generate a different string from another. > > The string that a device produces is not necessarily the only string > > the vendor driver will accept, for example the driver might support > > backwards compatible migrations. > > (As I've said in the previous discussion, off one of the patch series) > > My view is it makes sense to have a half-way house on the opaqueness of > this string; I'd expect to have an ID and version that are human > readable, maybe a device ID/name that's human interpretable and then a > bunch of other cruft that maybe device/vendor/version specific. > > I'm thinking that we want to be able to report problems and include the > string and the user to be able to easily identify the device that was > complaining and notice a difference in versions, and perhaps also use > it in compatibility patterns to find compatible hosts; but that does > get tricky when it's a 'ask the device if it's compatible'. In the reply I just sent to Dan, I gave this example of what a "compatibility string" might look like represented as json: { "device_api": "vfio-pci", "vendor": "vendor-driver-name", "version": { "major": 0, "minor": 1 }, "vfio-pci": { // Based on above device_api "vendor": 0x1234, // Values for the exposed device "device": 0x5678, // Possibly further parameters for a more specific match }, "mdev_attrs": [ { "attribute0": "VALUE" } ] } Are you thinking that we might allow the vendor to include a vendor specific array where we'd simply require that both sides have matching fields and values? ie. "vendor_fields": [ { "unknown_field0": "unknown_value0" }, { "unknown_field1": "unknown_value1" }, ] We could certainly make that part of the spec, but I can't really figure the value of it other than to severely restrict compatibility, which the vendor could already do via the version.major value. Maybe they'd want to put a build timestamp, random uuid, or source sha1 into such a field to make absolutely certain compatibility is only determined between identical builds? Thanks, Alex From smooney at redhat.com Tue Jul 14 21:15:33 2020 From: smooney at redhat.com (Sean Mooney) Date: Tue, 14 Jul 2020 22:15:33 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714110148.0471c03c@x1.home> Message-ID: <8ef6f52dd7e03d19c7d862350f2d1ecf070f1d63.camel@redhat.com> resending with full cc list since i had this typed up i would blame my email provier but my email client does not seam to like long cc lists. we probably want to continue on alex's thread to not split the disscusion. but i have responed inline with some example of how openstack schdules and what i ment by different mdev_types On Tue, 2020-07-14 at 20:29 +0100, Sean Mooney wrote: > On Tue, 2020-07-14 at 11:01 -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 13:33:24 +0100 > > Sean Mooney wrote: > > > > > On Tue, 2020-07-14 at 11:21 +0100, Daniel P. Berrangé wrote: > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > mdev live migration is completely possible to do but i agree with Dan barrange's comments > > > from the point of view of openstack integration i dont see calling out to a vender sepecific > > > tool to be an accpetable > > > > As I replied to Dan, I'm hoping Yan was referring more to vendor > > specific knowledge rather than actual tools. > > > > > solutions for device compatiablity checking. the sys filesystem > > > that describs the mdevs that can be created shoudl also > > > contain the relevent infomation such > > > taht nova could integrate it via libvirt xml representation or directly retrive the > > > info from > > > sysfs. > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > so vf to vf migration is not possible in the general case as there is no standarised > > > way to transfer teh device state as part of the siorv specs produced by the pci-sig > > > as such there is not vender neutral way to support sriov live migration. > > > > We're not talking about a general case, we're talking about physical > > devices which have vfio wrappers or hooks with device specific > > knowledge in order to support the vfio migration interface. The point > > is that a discussion around vfio device migration cannot be limited to > > mdev devices. > > ok upstream in openstack at least we do not plan to support generic livemigration > for passthough devivces. we cheat with network interfaces since in generaly operating > systems handel hotplug of a nic somewhat safely so wehre no abstraction layer like > an mdev is present or a macvtap device we hot unplug the nic before the migration > and attach a new one after. for gpus or crypto cards this likely would not be viable > since you can bond generic hardware devices to hide the removal and readdtion of a generic > pci device. we were hoping that there would be a convergenca around MDEVs as a way to provide > that abstraction going forward for generic device or some other new mechanisum in the future. > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > that also makes this unviable > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > > well actully that is already too late really. ideally we would want to do this compaiablity > > > check much sooneer to avoid the migration failing. in an openstack envionment at least > > > by the time we invoke libvirt (assuming your using the libvirt driver) to do the migration we have alreaedy > > > finished schduling the instance to the new host. if if we do the compatiablity check at this point > > > and it fails then the live migration is aborted and will not be retired. These types of late check lead to a > > > poor user experince as unless you check the migration detial it basically looks like the migration was ignored > > > as it start to migrate and then continuge running on the orgininal host. > > > > > > when using generic pci passhotuhg with openstack, the pci alias is intended to reference a single vendor > > > id/product > > > id so you will have 1+ alias for each type of device. that allows openstack to schedule based on the availability > > > of > > > a > > > compatibale device because we track inventories of pci devices and can query that when selecting a host. > > > > > > if we were to support mdev live migration in the future we would want to take the same declarative approch. > > > 1 interospec the capability of the deivce we manage > > > 2 create inventories of the allocatable devices and there capabilities > > > 3 schdule the instance to a host based on the device-type/capabilities and claim it atomicly to prevent raceces > > > 4 have the lower level hyperviors do addtional validation if need prelive migration. > > > > > > this proposal seams to be targeting extending step 4 where as ideally we should focuse on providing the info that > > > would > > > be relevant in set 1 preferably in a vendor neutral way vai a kernel interface like /sys. > > > > I think this is reading a whole lot into the phrase "last step". We > > want to make the information available for a management engine to > > consume as needed to make informed decisions regarding likely > > compatible target devices. > > well openstack as a management engin has 3 stages for schdule and asignment,. > in respocne to a live migration request the api does minimal valaidation then hand the task off to the conductor > service > ot orchestrate. the conductor invokes an rpc to the schduler service which makes a rest call to the plamcent service. > the placment cervice generate a set of allocation candiate for host based on qunataive and qulaitivly > queries agains an abstract resouce provider tree model of the hosts. > currently device pasthough is not modeled in placment so plamcnet is basicaly returning a set of host that have enough > cpu ram and disk for the instance. in the spacial of vGPU they technically are modelled in placement but not in a way > that would gurarentee compatiablity for migration. a generic pci device request is haneled in the second phase of > schduling called filtering and weighing. in this pahse the nova schuleer apply a series of filter to the list of host > returned by plamcnet to assert things like anit afintiy, tenant isolation or in the case of this converation nuam > affintiy and pci device avaiablity. when we have filtered the posible set of host down to X number we weigh the > listing > to select an optimal host and set of alternitive hosts. we then enter the code that this mail suggest modfiying which > does an rpc call to the destiation host form teh conductor to have it assert compatiablity which internaly calls back > to > the sourc host. > > so my point is we have done a lot of work by the time we call check_can_live_migrate_destination and failing > at this point is considerd quite a late failure but its still better then failing when qemu actully tries to migrate. > in general we would prefer to move compatiablity check as early in that workflow as possible but to be fair we dont > actully check cpu model compatiablity until check_can_live_migrate_destination. > https://github.com/openstack/nova/blob/8988316b8c132c9662dea6cf0345975e87ce7344/nova/virt/libvirt/driver.py#L8325-L8331 > > if we needed too we could read the version string on the source and write the version string on the dest at this > point. > doing so however would be considerd, inelegant, we have found this does not scale as the first copmpatabilty check. > for cpu for example there are way to filter hosts by groups sets fo host with the same cpu or filtering on cpu feature > flags that happen in the placment or filter stage both of which are very early and cheap to do at runtime. > > the "read for version, write for compatibility" workflow could be used as a final safe check if required but > probing for compatibility via writes is basicaly considered an anti patteren in openstack. we try to always > assert compatibility by reading avaiable info and asserting requirement over it not testing to see if it works. > > this has come up in the past in the context of virtio feature flag where the idea of spawning an instrance or trying > to add a virtio port to ovs dpdk that reqested a specific feature flag was rejected as unacceptable from a performance > and security point of view. > > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > this might be useful as we could tag the inventory with the migration version and only might to > > > devices with the same version > > > > Is cross version compatibility something that you'd consider using? > > yes but it would depend on what cross version actully ment. > > the version of an mdev is not something we would want to be exposed to endusers. > it would be a security risk to do so as the version sting would potentaily allow the untrused user > to discover if a device has an unpatch vulnerablity. as a result in the context of live migration > we can only support cross verion compatiabilyt if the device in the guest does not alter as > part of the migration and the behavior does not change. > > going form version 1.0 with feature X to verions 1.1 with feature X and Y but only X enabled would > be fine. going gorm 1.0 to 2.0 where thre is only feature Y would not be ok. > being abstract makes it a little harder to readabout but i guess i would sumerisei if its > transparent to the guest for the lifetime of the qemu process then its ok for the backing version to change. > if a vm is rebooted its also ok fo the vm to pick up feature Y form the 1.1 device although at that point > it could not be migrated back to the 1.0 host as it now has feature X and Y and 1.0 only has X so that woudl be > an obserable change if it was drop as a reult of the live migration. > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > this would not be useful as the schduler cannot directlly connect to the compute host > > > and even if it could it would be extreamly slow to do this for 1000s of hosts and potentally > > > multiple devices per host. > > > > Seems similar to Dan's requirement, looks like the 'read for version, > > write for compatibility' test idea isn't really viable. > > its ineffiecnt and we have reject adding such test in the case of virtio-feature flag compatiabilty > in the past, so its more an option of last resourt if we have no other way to support compatiablity > checking. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > > opaque vendor specific stings that higher level orchestros have to pass form host > > > to host and cant reason about are evil, when allowed they prolifroate and > > > makes any idea of a vendor nutral abstraction and interoperablity between systems > > > impossible to reason about. that said there is a way to make it opaue but still useful > > > to userspace. see below > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > honestly i would much prefer if the version string was just a semver string. > > > e.g. {major}.{minor}.{bugfix} > > > > > > if you do a driver/frimware update and break compatiablity with an older version bump the > > > major version. > > > > > > if you add optional a feature that does not break backwards compatiablity if you migrate > > > an older instance to the new host then just bump the minor/feature number. > > > > > > if you have a fix for a bug that does not change the feature set or compatiblity backwards or > > > forwards then bump the bugfix number > > > > > > then the check is as simple as > > > 1.) is the mdev type the same > > > 2.) is the major verion the same > > > 3.) am i going form the same version to same version or same version to newer version > > > > > > if all 3 are true we can migrate. > > > e.g. > > > 2.0.1 -> 2.1.1 (ok same major version and migrating from older feature release to newer feature release) > > > 2.1.1 -> 2.0.1 (not ok same major version and migrating from new feature release to old feature release may be > > > incompatable) > > > 2.0.0 -> 3.0.0 (not ok chaning major version) > > > 2.0.1 -> 2.0.0 (ok same major and minor version, all bugfixs in the same minor release should be compatibly) > > > > What's the value of the bugfix field in this scheme? > > its not require but really its for a non visable chagne form a feature standpoint. > a rather contrived example but if it was quadratic to inital a set of queues or device bufferes > in 1.0.0 and you made it liniar in 1.0.1 that is a performace improvment in the device intialisation time > which is great but it would not affect the feature set or compatiablity in any way. you could call it > a feature but its really just an internal change but you might want to still bump the version number. > > > > The simplicity is good, but is it too simple. It's not immediately > > clear to me whether all features can be hidden behind a minor version. > > For instance, if we have an mdev device that supports this notion of > > aggregation, which is proposed as a solution to the problem that > > physical hardware might support lots and lots of assignable interfaces > > which can be combined into arbitrary sets for mdev devices, making it > > impractical to expose an mdev type for every possible enumeration of > > assignable interfaces within a device. > > so this is a modeling problem and likely a limitation of the current way an mdev_type is exposed. > stealing some linux doc eamples > > > |- [parent physical device] > |--- Vendor-specific-attributes [optional] > |--- [mdev_supported_types] > | |--- [] > | | |--- create > | | |--- name > | | |--- available_instances > | | |--- device_api > | | |--- description > > you could adress this in 1 of at least 3 ways. > 1.) mdev type for each enmartion which is fine for 1-2 variabley othersize its a combinitroial explotions. > 2.) report each of the consomable sub componetns as an mdev type and create mupltipel mdevs and assign them to the vm. > 3.) provider an api to dynamically compose mdevs types which staticaly partion the reqouese and can then be consomed > perferably embeding the resouce infomation in the description filed in a huma/machince readable form. > > 2 and 3 woudl work well with openstack however they both have there challanges > 1 doesnt really work for anyone out side of a demo. > > We therefore expose a base type > > where the aggregation is built later. This essentially puts us in a > > scenario where even within an mdev type running on the same driver, > > there are devices that are not directly compatible with each other. > > > > > we dont need vendor to rencode the driver name or vendor id and product id in the string. that info is alreay > > > available both to the device driver and to userspace via /sys already we just need to know if version of > > > the same mdev are compatiable so a simple semver version string which is well know in the software world > > > at least is a clean abstration we can reuse. > > > > This presumes there's no cross device migration. > > no but it does assume no cross mdev_type migration. > it assuems that nvida_mdev_type_x on host 1 is the same as nvida_mdev_type_x on host 2. > if the parent device differese but support the same mdev type we are asserting that they > should be compatiable or a differnt mdev_type name should be used on each device. > > so we are presuming the mdev type cant change as part of a live migration and if the type > was to change it would no longer be a live migration operation it would be something else. > that is based on the premis that changing the mdev type would change the capabilities of the mdev > > > An mdev type can only > > be migrated to the same mdev type, all of the devices within that type > > have some based compatibility, a phsyical device can only be migrated to > > the same physical device. In the latter case what defines the type? > > the type-id in /sysfs > > /sys/devices/virtual/mtty/mtty/ > |-- mdev_supported_types > | |-- mtty-1 <---- this is an mdev type > | | |-- available_instances > | | |-- create > | | |-- device_api > | | |-- devices > | | `-- name > | `-- mtty-2 <---- as is this > | |-- available_instances > | |-- create > | |-- device_api > | |-- devices > | `-- name > > |- [parent phy device] > |--- [$MDEV_UUID] > |--- remove > |--- mdev_type {link to its type} <-- here > |--- vendor-specific-attributes [optional] > > > If > > it's a PCI device, is it only vendor:device IDs? > > no the mdev type is not defined by the vendor:device id of the parent device > although the capablityes of that device will determin what mdev types if any it supprots. > > What about revision? > > What about subsystem IDs? > > at least for nvidia gpus i dont think if you by an evga branded v100 vs an pny branded one the capability > would change but i do know that certenly the capablities of a dell branding intel nic and an intel branded > one can. e.g. i have seen oem sku nics without sriov eventhoguh the same nic form intel supports it. > sriov was deliberatly disabled in the dell firmware even though it share dhte same vendor and prodcut id but differnt > subsystem id. > > if the odm made an incomatipable change like that which affect an mdev type in some way i guess i would expect them to > change the name or the description filed content to signal that. > > > What about possibly an onboard ROM or > > internal firmware? > > i would expect that updating the firmware/rom could result in changing a version string. that is how i was imagining > it would change. > > The information may be available, but which things > > are relevant to migration? > > that i dont know an i really would not like to encode that knolage in the vendor specific way in higher level > tools like openstack or even libvirt. declarative version sting comparisons or even simile feature flag > check where an abstract huristic that can be applied across vendors would be fine. but yes i dont know > what info would be needed in this case. > > We already see desires to allow migration > > between physical and mdev, > > migration between a phsical device and an mdev would not generally be considered a live migration in openstack. > that would be a different operation as it would be user visible withing the guest vm. > > but also to expose mdev types that might be > > composable to be compatible with other types. Thanks, > > i think composable mdev types are really challanging without some kind of feature flag concept > like cpu flags or ethtool nic capablities that are both human readable and easily parsable. > > we have the capability to schedule on cpu flags or gpu cuda level using a traits abstraction > so instead of saying i want an vm on a host with an intel 2695v3 to ensure it has AVX > you say i want an vm that is capable of using AVX > https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86/__init__.py#L18 > > we also have trait for cuda level so instead of asking for a specifc mdev type or nvida > gpu the idea was you woudl describe what feature cuda in this exmple you need > https://github.com/openstack/os-traits/blob/master/os_traits/hw/gpu/cuda.py#L16-L45 > > That is what we call qualitative schudleing and is why we create teh placement service. > with out going in to the weeds we try to decouple quantaitive request such as 4 cpus and 1G of ram > form the qunative i need AVX supprot > > e.g. resouces:VCPU=4,resouces:MEMORY_MB=1024 triats:required=HW_CPU_X86_AVX > > declarative quantitive and capablites reporting of resouces fits easily into that model. > dynamic quantities that change as other mdev are allocated from the parent device or as > new mdevs types are composed on the fly are very challenging. > > > > > Alex > > > > From soulxu at gmail.com Wed Jul 15 07:23:42 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 15:23:42 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714101616.5d3a9e75@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> Message-ID: Alex Williamson 于2020年7月15日周三 上午12:16写道: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps > upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to > check > > > if one device is able to migrate to another device before triggering a > real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please > don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > userspace tools read the migration_version as a string from the source > device, > > > and write it to the migration_version sysfs attribute in the target > device. > > > > > > The userspace should treat ANY of below conditions as two devices not > compatible: > > > - any one of the two devices does not have a migration_version > attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device > vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > If the "configured remote storage URL" is something configuration setting before the usage, then it isn't something we need for migration compatible check. Openstack only needs to know the target device's driver and hardware compatible for migration, then the scheduler will choose a host which such device, and then Openstack will pre-configure the target host and target device before the migration, then openstack will configure the correct remote storage URL to the device. If we want, we can do a sanity check after the live migration with the os. > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. > > > > (2) backgrounds > > > > > > The reason we hope the migration_version string is opaque to the > userspace > > > is that it is hard to generalize standard comparing fields and > comparing > > > methods for different devices from different vendors. > > > Though userspace now could still do a simple string compare to check if > > > two devices are compatible, and result should also be right, it's still > > > too limited as it excludes the possible candidate whose > migration_version > > > string fails to be equal. > > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably > compatible > > > with another MDEV with mdev_type_3, aggregator count 1, even their > > > migration_version strings are not equal. > > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > > > besides that, driver version + configured resources are all elements > demanding > > > to take into account. > > > > > > So, we hope leaving the freedom to vendor driver and let it make the > final decision > > > in a simple reading from source side and writing for test in the > target side way. > > > > > > > > > we then think the device compatibility issues for live migration with > assigned > > > devices can be divided into two steps: > > > a. management tools filter out possible migration target devices. > > > Tags could be created according to info from product specification. > > > we think openstack/ovirt may have vendor proprietary components to > create > > > those customized tags for each product from each vendor. > > > > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags > to > > > search target vGPU are like: > > > a tag for compatible parent PCI IDs, > > > a tag for a range of gvt driver versions, > > > a tag for a range of mdev type + aggregator count > > > > > > for NVMe VF, the tags to search target VF may be like: > > > a tag for compatible PCI IDs, > > > a tag for a range of driver versions, > > > a tag for URL of configured remote storage. > > I interpret this as hand waving, ie. the first step is for management > tools to make a good guess :-\ We don't seem to be willing to say that > a given mdev type can only migrate to a device with that same type. > There's this aggregation discussion happening separately where a base > mdev type might be created or later configured to be equivalent to a > different type. The vfio migration API we've defined is also not > limited to mdev devices, for example we could create vendor specific > quirks or hooks to provide migration support for a physical PF/VF > device. Within the realm of possibility then is that we could migrate > between a physical device and an mdev device, which are simply > different degrees of creating a virtualization layer in front of the > device. > > > Requiring management application developers to figure out this possible > > compatibility based on prod specs is really unrealistic. Product specs > > are typically as clear as mud, and with the suggestion we consider > > different rules for different types of devices, add up to a huge amount > > of complexity. This isn't something app developers should have to spend > > their time figuring out. > > Agreed. > > > The suggestion that we make use of vendor proprietary helper components > > is totally unacceptable. We need to be able to build a solution that > > works with exclusively an open source software stack. > > I'm surprised to see this as well, but I'm not sure if Yan was really > suggesting proprietary software so much as just vendor specific > knowledge. > > > IMHO there needs to be a mechanism for the kernel to report via sysfs > > what versions are supported on a given device. This puts the job of > > reporting compatible versions directly under the responsibility of the > > vendor who writes the kernel driver for it. They are the ones with the > > best knowledge of the hardware they've built and the rules around its > > compatibility. > > The version string discussed previously is the version string that > represents a given device, possibly including driver information, > configuration, etc. I think what you're asking for here is an > enumeration of every possible version string that a given device could > accept as an incoming migration stream. If we consider the string as > opaque, that means the vendor driver needs to generate a separate > string for every possible version it could accept, for every possible > configuration option. That potentially becomes an excessive amount of > data to either generate or manage. For the configuration options, there are two kinds of configuration options are needn't for the migration check. * The configuration option makes the device different, for example(could be wrong example, not matching any real hardware), A GPU supports 1024* 768 resolution and 800 * 600 resolution VGPUs, the OpenStack will separate this two kinds of VGPUs into two separate resource pool. so the scheduler already ensures we get a host with such vGPU support. so it needn't encode into the 'version string' discussed here. * The configuration option is setting before usage, just like the 'configured remote storage URL' above, it needn't encoded into the 'version string' also. Since the openstack will configure the correct value before the migration. > Am I overestimating how vendors intend to use the version string? > > We'd also need to consider devices that we could create, for instance > providing the same interface enumeration prior to creating an mdev > device to have a confidence level that the new device would be a valid > target. > > We defined the string as opaque to allow vendor flexibility and because > defining a common format is hard. Do we need to revisit this part of > the discussion to define the version string as non-opaque with parsing > rules, probably with separate incoming vs outgoing interfaces? Thanks, > > Alex > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Wed Jul 15 07:37:19 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 15:37:19 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: Alex Williamson 于2020年7月15日周三 上午5:00写道: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of > the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to > check > > > > > if one device is able to migrate to another device before > triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices > not compatible: > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver > version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > The OpenStack Placement service doesn't support to filtering the target host by the semver syntax, altough we can code this filtering logic inside scheduler filtering by python code. Basically, placement only supports filtering the host by traits (it is same thing with labels, tags). The nova scheduler will call the placement service to filter the hosts first, then go through all the scheduler filters. That would be great if the placement service can filter out more hosts which isn't compatible first, and then it is better. > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > OpenStack already based on vendor and device id to separate the devices into the different resource pool, then the scheduler based on that to filer the hosts, so I think it needn't be the part of this compatibility string. > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > Since the placement support traits (labels, tags), so the placement just to matching those fields, so it isn't problem of openstack, since openstack needn't to know the meaning of those fields. But the traits is just a label, it isn't key-value format. But also if we have to, we can code this scheduler filter by python code. But the same thing as above, the invalid host can't be filtered out in the first step placement service filtering. > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, > > Alex > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgilbert at redhat.com Wed Jul 15 08:23:09 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Wed, 15 Jul 2020 09:23:09 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200715082309.GC2864@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, No, I'd mostly anticipated matching on the vendor and device and maybe a version number for the bit the user specifies; I had assumed all that 'vendor cruft' was still mostly opaque; having said that, if it did become a list of attributes like that (some of which were vendor specific) that would make sense to me. Dave > > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From yan.y.zhao at intel.com Wed Jul 15 08:20:41 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Wed, 15 Jul 2020 16:20:41 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714145948.17b95eb3@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200715082040.GA13136@joy-OptiPlex-7040> On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > if one device is able to migrate to another device before triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the source device, > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only determined > between identical builds? Thanks, > Yes, I agree kernel could expose such sysfs interface to educate openstack how to filter out devices. But I still think the proposed migration_version (or rename to migration_compatibility) interface is still required for libvirt to do double check. In the following scenario: 1. openstack chooses the target device by reading sysfs interface (of json format) of the source device. And Openstack are now pretty sure the two devices are migration compatible. 2. openstack asks libvirt to create the target VM with the target device and start live migration. 3. libvirt now receives the request. so it now has two choices: (1) create the target VM & target device and start live migration directly (2) double check if the target device is compatible with the source device before doing the remaining tasks. Because the factors to determine whether two devices are live migration compatible are complicated and may be dynamically changing, (e.g. driver upgrade or configuration changes), and also because libvirt should not totally rely on the input from openstack, I think the cost for libvirt is relatively lower if it chooses to go (2) than (1). At least it has no need to cancel migration and destroy the VM if it knows it earlier. So, it means the kernel may need to expose two parallel interfaces: (1) with json format, enumerating all possible fields and comparing methods, so as to indicate openstack how to find a matching target device (2) an opaque driver defined string, requiring write and test in target, which is used by libvirt to make sure device compatibility, rather than rely on the input accurateness from openstack or rely on kernel driver implementing the compatibility detection immediately after migration start. Does it make sense? Thanks Yan From shaohe.feng at intel.com Wed Jul 15 08:49:06 2020 From: shaohe.feng at intel.com (Feng, Shaohe) Date: Wed, 15 Jul 2020 08:49:06 +0000 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: <7B5303F69BB16B41BB853647B3E5BD70600BB667@SHSMSX104.ccr.corp.intel.com> -----Original Message----- From: Zhao, Yan Y Sent: 2020年7月15日 16:21 To: Alex Williamson Cc: Dr. David Alan Gilbert ; Daniel P. Berrangé ; devel at ovirt.org; openstack-discuss at lists.openstack.org; libvir-list at redhat.com; intel-gvt-dev at lists.freedesktop.org; kvm at vger.kernel.org; qemu-devel at nongnu.org; smooney at redhat.com; eskultet at redhat.com; cohuck at redhat.com; dinechin at redhat.com; corbet at lwn.net; kwankhede at nvidia.com; eauger at redhat.com; Ding, Jian-feng ; Xu, Hejie ; Tian, Kevin ; zhenyuw at linux.intel.com; bao.yumeng at zte.com.cn; Wang, Xin-ran ; Feng, Shaohe Subject: Re: device compatibility interface for live migration with assigned devices On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrangé > > > wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface > > > > > that helps upper layer stack like openstack/ovirt/libvirt to > > > > > check if two devices are live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last > > > > > step to check if one device is able to migrate to another > > > > > device before triggering a real live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > > > > > please don't hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __ userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > > ________/__________ ___\|/_____________ > > > > > | migration_version | | migration_version |-->check migration > > > > > --------------------- --------------------- compatibility > > > > > device A device B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under > > > > > each device's sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from > > > > > the source device, and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > - any one of the two devices does not have a migration_version > > > > > attribute > > > > > - error when reading from migration_version attribute of one > > > > > device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > > > > > device vendor driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like "parent > > > > > device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may > > > > > prefix a driver name to each migration_version string. e.g. > > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to > > > describe the contents of that opaque string. The point is that > > > its contents are defined by the vendor driver to describe the > > > device, driver version, and possibly metadata about the > > > configuration of the device. One instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only > > > string the vendor driver will accept, for example the driver might > > > support backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch > > series) > > > > My view is it makes sense to have a half-way house on the opaqueness > > of this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then > > a bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include > > the string and the user to be able to easily identify the device > > that was complaining and notice a difference in versions, and > > perhaps also use it in compatibility patterns to find compatible > > hosts; but that does get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > "vfio-pci": { // Based on above device_api > "vendor": 0x1234, // Values for the exposed device > "device": 0x5678, > // Possibly further parameters for a more specific match > }, > "mdev_attrs": [ > { "attribute0": "VALUE" } > ] > } > > Are you thinking that we might allow the vendor to include a vendor > specific array where we'd simply require that both sides have matching > fields and values? ie. > > "vendor_fields": [ > { "unknown_field0": "unknown_value0" }, > { "unknown_field1": "unknown_value1" }, > ] > > We could certainly make that part of the spec, but I can't really > figure the value of it other than to severely restrict compatibility, > which the vendor could already do via the version.major value. Maybe > they'd want to put a build timestamp, random uuid, or source sha1 into > such a field to make absolutely certain compatibility is only > determined between identical builds? Thanks, > Yes, I agree kernel could expose such sysfs interface to educate openstack how to filter out devices. But I still think the proposed migration_version (or rename to migration_compatibility) interface is still required for libvirt to do double check. In the following scenario: 1. openstack chooses the target device by reading sysfs interface (of json format) of the source device. And Openstack are now pretty sure the two devices are migration compatible. 2. openstack asks libvirt to create the target VM with the target device and start live migration. 3. libvirt now receives the request. so it now has two choices: (1) create the target VM & target device and start live migration directly (2) double check if the target device is compatible with the source device before doing the remaining tasks. Because the factors to determine whether two devices are live migration compatible are complicated and may be dynamically changing, (e.g. driver upgrade or configuration changes), and also because libvirt should not totally rely on the input from openstack, I think the cost for libvirt is relatively lower if it chooses to go (2) than (1). At least it has no need to cancel migration and destroy the VM if it knows it earlier. So, it means the kernel may need to expose two parallel interfaces: (1) with json format, enumerating all possible fields and comparing methods, so as to indicate openstack how to find a matching target device (2) an opaque driver defined string, requiring write and test in target, which is used by libvirt to make sure device compatibility, rather than rely on the input accurateness from openstack or rely on kernel driver implementing the compatibility detection immediately after migration start. Does it make sense? [Feng, Shaohe] Yes, had better 2 interface for different phase of live migration. For (1), it is can leverage these information for scheduler to minimize the failure rate of migration. The problem is that which value should be used for scheduler guide. The values should be human readable. For (2) yes we can't assume that the migration always screenful, double check is needed. BR Shaohe Thanks Yan From berrange at redhat.com Wed Jul 15 09:16:41 2020 From: berrange at redhat.com (Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?=) Date: Wed, 15 Jul 2020 10:16:41 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200714144715.0ef70074@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714164722.GL25187@redhat.com> <20200714144715.0ef70074@x1.home> Message-ID: <20200715091641.GD68910@redhat.com> On Tue, Jul 14, 2020 at 02:47:15PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 17:47:22 +0100 > Daniel P. Berrangé wrote: > > I'm sure OpenStack maintainers can speak to this more, as they've put > > alot of work into their scheduling engine to optimize the way it places > > VMs largely driven from simple structured data reported from hosts. > > I think we've weeded out that our intended approach is not worthwhile, > testing a compatibility string at a device is too much overhead, we > need to provide enough information to the management engine to predict > the response without interaction beyond the initial capability probing. Just to clarify in case people mis-interpreted my POV... I think that testing a compatibility string at a device *is* useful, as it allows for a final accurate safety check to be performed before the migration stream starts. Libvirt could use that reasonably easily I believe. It just isn't sufficient for a complete solution. In parallel with the device level test in sysfs, we need something else to support the host placement selection problems in an efficient way, as you are trying to address in the remainder of your mail. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| From soulxu at gmail.com Wed Jul 15 09:21:09 2020 From: soulxu at gmail.com (Alex Xu) Date: Wed, 15 Jul 2020 17:21:09 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: Yan Zhao 于2020年7月15日周三 下午4:32写道: > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid > of the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step > to check > > > > > > if one device is able to migrate to another device before > triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check > migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two > devices not compatible: > > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > > - error when reading from migration_version attribute of one > device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix > a driver name to > > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver > version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. > > > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > Yes, I agree kernel could expose such sysfs interface to educate > openstack how to filter out devices. But I still think the proposed > migration_version (or rename to migration_compatibility) interface is > still required for libvirt to do double check. > > In the following scenario: > 1. openstack chooses the target device by reading sysfs interface (of json > format) of the source device. And Openstack are now pretty sure the two > devices are migration compatible. > 2. openstack asks libvirt to create the target VM with the target device > and start live migration. > 3. libvirt now receives the request. so it now has two choices: > (1) create the target VM & target device and start live migration directly > (2) double check if the target device is compatible with the source > device before doing the remaining tasks. > > Because the factors to determine whether two devices are live migration > compatible are complicated and may be dynamically changing, (e.g. driver > upgrade or configuration changes), and also because libvirt should not > totally rely on the input from openstack, I think the cost for libvirt is > relatively lower if it chooses to go (2) than (1). At least it has no > need to cancel migration and destroy the VM if it knows it earlier. > If the driver upgrade or configuration changes, I guess there should be a restart of openstack agent on the host, that will update the info to the scheduler. so it should be fine. For (2), probably it need be used for double check when the orchestration layer doesn't implement the check logic in the scheduler. > > So, it means the kernel may need to expose two parallel interfaces: > (1) with json format, enumerating all possible fields and comparing > methods, so as to indicate openstack how to find a matching target device > (2) an opaque driver defined string, requiring write and test in target, > which is used by libvirt to make sure device compatibility, rather than > rely on the input accurateness from openstack or rely on kernel driver > implementing the compatibility detection immediately after migration > start. > > Does it make sense? > > Thanks > Yan > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at infomaniak.com Wed Jul 15 12:13:54 2020 From: zigo at infomaniak.com (Thomas Goirand) Date: Wed, 15 Jul 2020 14:13:54 +0200 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> Message-ID: <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Hi Ryan, If you don't mind, I'm adding the openstack-discuss list in the loop, as this topic may be of interest to others. For mailing list readers, I'm trying to implement this: https://review.opendev.org/#/c/669395/ but I'm having some difficulties. I did a bit of investigation with some added LOG.info() in the code. When doing: > openstack subnet create vm-fip \ > --subnet-range 10.66.20.0/24 \ > --service-type 'network:routed' \ > --service-type 'network:floatingip' \ > --network multisegment1 Here's where neutron-api crashes. in db/ipam_backend_mixin.py: def _validate_segment(self, context, network_id, segment_id, action=None, old_segment_id=None): # TODO(tidwellr) Create and use a constant for the service type segments = subnet_obj.Subnet.get_subnet_segment_ids( context, network_id, filtered_service_type='network:routed') associated_segments = set(segments) if None in associated_segments and len(associated_segments) > 1: raise segment_exc.SubnetsNotAllAssociatedWithSegments( network_id=network_id) SubnetsNotAllAssociatedWithSegments() is raised, as you must already guessed. Here's the values... associated_segments is an array containing 3 values: 2 being the IDs of the segments I added previously, the 3rd one being None. This test is then matched. Where is that None value coming from? Is this the new subnet I'm trying to add? Maybe the filtered_service_type='network:routed' in the call: subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? Printing the SQL query that is checked shows: SELECT subnets.segment_id AS subnets_segment_id FROM subnets WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id FROM subnet_service_types WHERE subnets.network_id = %(network_id_2)s AND subnet_service_types.subnet_id = subnets.id AND subnet_service_types.service_type = %(service_type_1)s) though when doing by hand: SELECT subnets.segment_id AS subnets_segment_id FROM subnets the db has only 2 subnets, so it looks like the floating-ip subnet got added before the check, and is then removed when the above test fails. So I just removed the raise, and could add the subnet I wanted, but that's obviously not a long term solution. Your thoughts? Another problem that I'm having, is that neutron-bgp-dragent is not receiving (or processing) the messages from neutron-rpc-server. I've enabled DEBUG mode for oslo_messaging, and found out that when dr-agent starts and prints "Agent has just been revived. Scheduling full sync", it does send a message to neutron-rpc-server, which is replied, but it doesn't look like dr-agent processes the return message in its reply queue, and then prints in the logs: "imeout in RPC method get_bgp_speakers. Waiting for 17 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is weird is that a few times (rarely), it worked, and the agent gets the reply. What should I do to investigate further? Cheers, Thomas Goirand (zigo) From jonathan at automatia.nl Wed Jul 15 12:34:13 2020 From: jonathan at automatia.nl (Jonathan de Jong) Date: Wed, 15 Jul 2020 14:34:13 +0200 Subject: [ideas] 3 new project drafts Message-ID: <4CEE40B4-74E6-49B1-9933-075C81D2A14C@getmailspring.com> Heya OpenStack community! The openstack-ideas website inspired me to create 3 more ideas, each based on some personal experiences and musings which OpenStack could address. Project "Dew": https://review.opendev.org/741008 (low-spec cloud computing) Project "Nebula": https://review.opendev.org/741057 (interface translation for plural or propriatary clouds) Project "Aurora": https://review.opendev.org/741165 (communal/collaborative cloud computing) I need to admit that these drafts are in my opnion extremely rough, very biased, and probably need to be rewritten several times. So that's why I invite people to discuss specifics and implementations of these ideas. If aspects of these ideas are similes of past proposals or projects, which have then been debunked/abandoned, i'm curious as to what discussion has happened for it to be rejected that way. If my language in these drafts are unreadable, confusing, simply too vague, or any other combination of sub-standard writing, please let me know. I plan to expand/improve/detail these project drafts from feedback and comments, please share if any of that comes to mind. Thanks in advance! - Jonathan de Jong From e0ne at e0ne.info Wed Jul 15 14:00:37 2020 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Wed, 15 Jul 2020 17:00:37 +0300 Subject: [horizon] No meeting today Message-ID: Hi team, I can't attend the meeting today, so let's skip it. If you've got any topics to discuss today we can do it in #openstack-horizon IRC channel. Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Jul 15 14:09:53 2020 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 15 Jul 2020 15:09:53 +0100 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: Hi Thomas: If I'm not wrong, the goal of this filtering is to remove all those subnets with service_type='network:routed'. Maybe you can check implementing an easier query: SELECT subnets.segment_id AS subnets_segment_id FROM subnets WHERE subnets.network_id = %(network_id_1)s AND NOT (EXISTS (SELECT * FROM subnet_service_types WHERE subnets.id = subnet_service_types.subnet_id AND subnet_service_types.service_type = %(service_type_1)s)) That will be translated to python as: query = test_db.context.session.query(subnet_obj.Subnet.db_model.segment_id) query = query.filter(subnet_obj.Subnet.db_model.network_id == network_id) if filtered_service_type: query = query.filter(~exists().where(and_( subnet_obj.Subnet.db_model.id == service_type_model.subnet_id, service_type_model.service_type == filtered_service_type))) Can you provide a UTs or a way to check the problem you are experiencing? Regards. On Wed, Jul 15, 2020 at 1:27 PM Thomas Goirand wrote: > Sending the message again with the correct From, as I'm not subscribed > to the list with the other mailbox. > > On 7/15/20 2:13 PM, Thomas Goirand wrote: > > Hi Ryan, > > > > If you don't mind, I'm adding the openstack-discuss list in the loop, as > > this topic may be of interest to others. > > > > For mailing list readers, I'm trying to implement this: > > https://review.opendev.org/#/c/669395/ > > but I'm having some difficulties. > > > > I did a bit of investigation with some added LOG.info() in the code. > > > > When doing: > > > >> openstack subnet create vm-fip \ > >> --subnet-range 10.66.20.0/24 \ > >> --service-type 'network:routed' \ > >> --service-type 'network:floatingip' \ > >> --network multisegment1 > > > > Here's where neutron-api crashes. in db/ipam_backend_mixin.py: > > > > def _validate_segment(self, context, network_id, segment_id, > > action=None, > > old_segment_id=None): > > # TODO(tidwellr) Create and use a constant for the service type > > segments = subnet_obj.Subnet.get_subnet_segment_ids( > > context, network_id, filtered_service_type='network:routed') > > > > associated_segments = set(segments) > > if None in associated_segments and len(associated_segments) > 1: > > raise segment_exc.SubnetsNotAllAssociatedWithSegments( > > network_id=network_id) > > > > SubnetsNotAllAssociatedWithSegments() is raised, as you must already > > guessed. Here's the values... > > > > associated_segments is an array containing 3 values: 2 being the IDs of > > the segments I added previously, the 3rd one being None. This test is > > then matched. Where is that None value coming from? Is this the new > > subnet I'm trying to add? Maybe the > > filtered_service_type='network:routed' in the call: > > subnet_obj.Subnet.get_subnet_segment_ids() isn't working as expected? > > > > Printing the SQL query that is checked shows: > > > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > WHERE subnets.network_id = %(network_id_1)s AND subnets.id NOT IN > > (SELECT subnet_service_types.subnet_id AS subnet_service_types_subnet_id > > FROM subnet_service_types > > WHERE subnets.network_id = %(network_id_2)s AND > > subnet_service_types.subnet_id = subnets.id AND > > subnet_service_types.service_type = %(service_type_1)s) > > > > though when doing by hand: > > > > SELECT subnets.segment_id AS subnets_segment_id FROM subnets > > > > the db has only 2 subnets, so it looks like the floating-ip subnet got > > added before the check, and is then removed when the above test fails. > > > > So I just removed the raise, and could add the subnet I wanted, but > > that's obviously not a long term solution. > > > > Your thoughts? > > > > Another problem that I'm having, is that neutron-bgp-dragent is not > > receiving (or processing) the messages from neutron-rpc-server. I've > > enabled DEBUG mode for oslo_messaging, and found out that when dr-agent > > starts and prints "Agent has just been revived. Scheduling full sync", > > it does send a message to neutron-rpc-server, which is replied, but it > > doesn't look like dr-agent processes the return message in its reply > > queue, and then prints in the logs: "imeout in RPC method > > get_bgp_speakers. Waiting for 17 seconds before next attempt. If the > > server is not down, consider increasing the rpc_response_timeout option > > as Neutron server(s) may be overloaded and unable to respond quickly > > enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting > > for a reply to message ID c1b401c9e10d481bb5e071f2c048e480". What is > > weird is that a few times (rarely), it worked, and the agent gets the > reply. > > > > What should I do to investigate further? > > > > Cheers, > > > > Thomas Goirand (zigo) > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From CAPSEY at augusta.edu Wed Jul 15 14:17:09 2020 From: CAPSEY at augusta.edu (Apsey, Christopher) Date: Wed, 15 Jul 2020 14:17:09 +0000 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch Message-ID: All, A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu-system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It seemed like a heavy lift at the time, so it was put on the back burner. I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful blueprint that would potentially be accepted? Most of the time when people want to run an emulated guest they would just nest it inside of an already running guest of the native architecture, but that severely limits observability and the task of managing any more than a handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This would also be a significant differentiator for OpenStack as a whole. Thoughts? [1] http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html Chris Apsey Director | Georgia Cyber Range GEORGIA CYBER CENTER 100 Grace Hopper Lane | Augusta, Georgia | 30901 https://www.gacybercenter.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jul 15 14:36:33 2020 From: smooney at redhat.com (Sean Mooney) Date: Wed, 15 Jul 2020 15:36:33 +0100 Subject: [nova][dev] Revisiting qemu emulation where guest arch != host arch In-Reply-To: References: Message-ID: On Wed, 2020-07-15 at 14:17 +0000, Apsey, Christopher wrote: > All, > > A few years ago I asked a question[1] about why nova, when given a hw_architecture property from glance for an image, > would not end up using the correct qemu-system-xx binary when starting the guest process on a compute node if that > compute nodes architecture did not match the proposed guest architecture. As an example, if we had all x86 hosts, but > wanted to run an emulated ppc guest, we should be able to do that given that at least one compute node had qemu- > system-ppc already installed and libvirt was successfully reporting that as a supported architecture to nova. It > seemed like a heavy lift at the time, so it was put on the back burner. > > I am now in a position to fund a contract developer to make this happen, so the question is: would this be a useful > blueprint that would potentially be accepted? this came up during the ptg and the over all felling was it should really work already and if it does not its a bug. so yes i fa blueprint was filed to support emulation based on the image hw_architecture property i dont think you will get objection altough we proably will want to allso have schduler support for this and report it to placemnt or have a whigher of some kind to make it a compelte solution. i.e. enhance the virt driver to report all the achitecure it support via traits and add a weigher to prefer native execution over emulation. so placement can tell use where it can run and the weigher can say where it will run best. see line 467 https://etherpad.opendev.org/p/nova-victoria-ptg > Most of the time when people want to run an emulated guest they would just nest it inside of an already running > guest of the native architecture, but that severely limits observability and the task of managing any more than a > handful of instances in this manner quickly becomes a tangled nightmare of networking, etc. I see real benefit in > allowing this scenario to run natively so all of the tooling that exists for fleet management 'just works'. This > would also be a significant differentiator for OpenStack as a whole. > > Thoughts? > > [1] > http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html > > Chris Apsey > Director | Georgia Cyber Range > GEORGIA CYBER CENTER > > 100 Grace Hopper Lane | Augusta, Georgia | 30901 > https://www.gacybercenter.org > From nhicher at redhat.com Wed Jul 15 15:40:17 2020 From: nhicher at redhat.com (Nicolas Hicher) Date: Wed, 15 Jul 2020 11:40:17 -0400 Subject: [Tripleo] Planned outage of review.rdoproject.org: 2020-07-15 from 18:00 to 20:00 UTC Message-ID: Hello folks, Our cloud provider plans to do maintainance operation on 2020-07-15 from 18:00 to 20:00 UTC. Service interruption is expected, including: - Zuul CI not running jobs for gerrit, github or opendev. - RDO Trunk not building new packages. - DLRN API. - review.rdoproject.org and softwarefactory-project.io gerrit service. Regards, Nicolas, on behalf of the Software Factory Operation Team From hjensas at redhat.com Wed Jul 15 18:26:55 2020 From: hjensas at redhat.com (Harald Jensas) Date: Wed, 15 Jul 2020 20:26:55 +0200 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 absolutely! On Wed, 15 Jul 2020, 14:07 John Fulton, wrote: > +1 I thought he was already a core. > > On Wed, Jul 15, 2020 at 7:05 AM Cédric Jeanneret > wrote: > >> Of course +1! >> >> On 7/14/20 3:30 PM, Emilien Macchi wrote: >> > Hi folks, >> > >> > Rabi has proved deep technical understanding on the TripleO components >> > over the last years. >> > Initially as a major maintainer of the Heat project and then a regular >> > contributor to TripleO, he got involved at different levels: >> > - Optimization of the Heat templates, to reduce the number of resources >> > or improve them to make it faster and more efficient at scale. >> > - Migration of the Mistral workflows into native Ansible modules and >> > Python code into tripleo-common, with end-to-end expertise. >> > - Regular contributions to the container tooling integration. >> > >> > Being involved on the mailing-list and IRC channels, Rabi is always >> > helpful to the community and here to help. >> > He has provided thorough reviews in principal components on TripleO as >> > well as a lot of bug fixes or new features; which contributed to make >> > TripleO more stable and scalable. I would like to propose him be part of >> > the TripleO core team. >> > >> > Thanks Rabi for your hard work! >> > -- >> > Emilien Macchi >> >> -- >> Cédric Jeanneret (He/Him/His) >> Sr. Software Engineer - OpenStack Platform >> Deployment Framework TC >> Red Hat EMEA >> https://www.redhat.com/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Wed Jul 15 18:13:15 2020 From: thomas.king at gmail.com (Thomas King) Date: Wed, 15 Jul 2020 12:13:15 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Ruslanas, that would be excellent! I will reply to you directly for details later unless the maillist would like the full thread. Some preliminary questions: - Do you have a separate physical interface for the segment(s) used for your remote subnets? The docs state each segment must have a unique physical network name, which suggests a separate physical interface for each segment unless I'm misunderstanding something. - Are your provisioning segments all on the same Neutron network? - Are you using tagged switchports or access switchports to your Ironic server(s)? Thanks, Tom King On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis wrote: > I have deployed that with tripleO, but now we are recabling and > redeploying it. So once I have it running I can share my configs, just name > which you want :) > > On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: > >> I have. That's the Triple-O docs and they don't go through the normal >> .conf files to explain how it works outside of Triple-O. It has some ideas >> but no running configurations. >> >> Tom King >> >> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >> wrote: >> >>> hi, have you checked: >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>> ? >>> I am following this link. I only have one network, having different >>> issues tho ;) >>> >>> >>> >>> On Tue, 14 Jul 2020 at 03:31, Thomas King wrote: >>> >>>> Thank you, Amy! >>>> >>>> Tom >>>> >>>> On Mon, Jul 13, 2020 at 5:19 PM Amy Marrich wrote: >>>> >>>>> Hey Tom, >>>>> >>>>> Adding the OpenStack discuss list as I think you got several replies >>>>> from there as well. >>>>> >>>>> Thanks, >>>>> >>>>> Amy (spotz) >>>>> >>>>> On Mon, Jul 13, 2020 at 5:37 PM Thomas King >>>>> wrote: >>>>> >>>>>> Good day, >>>>>> >>>>>> I'm bringing up a thread from June about DHCP relay with neutron >>>>>> networks in Ironic, specifically using unicast relay. The Triple-O docs do >>>>>> not have the plain config/neutron config to show how a regular Ironic setup >>>>>> would use DHCP relay. >>>>>> >>>>>> The Neutron segments docs state that I must have a unique physical >>>>>> network name. If my Ironic controller has a single provisioning network >>>>>> with a single physical network name, doesn't this prevent my use of >>>>>> multiple segments? >>>>>> >>>>>> Further, the segments docs state this: "The operator must ensure >>>>>> that every compute host that is supposed to participate in a router >>>>>> provider network has direct connectivity to one of its segments." (section >>>>>> 3 at >>>>>> https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html#prerequisites - >>>>>> current docs state the same thing) >>>>>> This defeats the purpose of using DHCP relay, though, where the >>>>>> Ironic controller does *not* have direct connectivity to the remote >>>>>> segment. >>>>>> >>>>>> Here is a rough drawing - what is wrong with my thinking here? >>>>>> Remote server: 10.146.30.32/27 VLAN 2116<-----> Router with DHCP >>>>>> relay <------> Ironic controller, provisioning network: >>>>>> 10.146.29.192/26 VLAN 2115 >>>>>> >>>>>> Thank you, >>>>>> Tom King >>>>>> _______________________________________________ >>>>>> openstack-mentoring mailing list >>>>>> openstack-mentoring at lists.openstack.org >>>>>> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-mentoring >>>>>> >>>>> >>> >>> -- >>> Ruslanas Gžibovskis >>> +370 6030 7030 >>> >> > > -- > Ruslanas Gžibovskis > +370 6030 7030 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Wed Jul 15 19:07:03 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Wed, 15 Jul 2020 22:07:03 +0300 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: Hi Thomas, I have a bit complicated setup from tripleo side :) I use only one network (only ControlPlane). thanks to Harold, he helped to make it work for me. Yes, as written in the tripleo docs for leaf networks, it use the same neutron network, different subnets. so neutron network is ctlplane (I think) and have ctlplane-subnet, remote-provision and remote-KI :)) that generates additional lines in "ip r s" output for routing "foreign" subnets through correct gw, if you would have isolated networks, by vlans and ports this would apply for each subnet different gw... I believe you know/understand that part. remote* subnets have dhcp-relay setup by network team... do not ask details for that. I do not know how to, but can ask :) in undercloud/tripleo i have 2 dhcp servers, one is for introspection, another for provide/cleanup and deployment process. all of those subnets have organization level tagged networks and are tagged on network devices, but they are untagged on provisioning interfaces/ports, as in general pxe should be untagged, but some nic's can do vlan untag on nic/bios level. but who cares!? I just did a brief check on your first post, I think I have simmilar setup to yours :)) I will check in around 12hours :)) more deaply, as will be at work :))) P.S. sorry for wrong terms, I am bad at naming. On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: > Ruslanas, that would be excellent! > > I will reply to you directly for details later unless the maillist would > like the full thread. > > Some preliminary questions: > > - Do you have a separate physical interface for the segment(s) used > for your remote subnets? > The docs state each segment must have a unique physical network name, > which suggests a separate physical interface for each segment unless I'm > misunderstanding something. > - Are your provisioning segments all on the same Neutron network? > - Are you using tagged switchports or access switchports to your > Ironic server(s)? > > Thanks, > Tom King > > On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis > wrote: > >> I have deployed that with tripleO, but now we are recabling and >> redeploying it. So once I have it running I can share my configs, just name >> which you want :) >> >> On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: >> >>> I have. That's the Triple-O docs and they don't go through the normal >>> .conf files to explain how it works outside of Triple-O. It has some ideas >>> but no running configurations. >>> >>> Tom King >>> >>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>> wrote: >>> >>>> hi, have you checked: >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>> ? >>>> I am following this link. I only have one network, having different >>>> issues tho ;) >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From beagles at redhat.com Wed Jul 15 20:16:49 2020 From: beagles at redhat.com (Brent Eagles) Date: Wed, 15 Jul 2020 17:46:49 -0230 Subject: [tripleo] Proposing Rabi Mishra part of tripleo-core In-Reply-To: References: Message-ID: +1 definitely! On Tue, Jul 14, 2020 at 11:03 AM Emilien Macchi wrote: > Hi folks, > > Rabi has proved deep technical understanding on the TripleO components > over the last years. > Initially as a major maintainer of the Heat project and then a regular > contributor to TripleO, he got involved at different levels: > - Optimization of the Heat templates, to reduce the number of resources or > improve them to make it faster and more efficient at scale. > - Migration of the Mistral workflows into native Ansible modules and > Python code into tripleo-common, with end-to-end expertise. > - Regular contributions to the container tooling integration. > > Being involved on the mailing-list and IRC channels, Rabi is always > helpful to the community and here to help. > He has provided thorough reviews in principal components on TripleO as > well as a lot of bug fixes or new features; which contributed to make > TripleO more stable and scalable. I would like to propose him be part of > the TripleO core team. > > Thanks Rabi for your hard work! > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.king at gmail.com Wed Jul 15 21:33:35 2020 From: thomas.king at gmail.com (Thomas King) Date: Wed, 15 Jul 2020 15:33:35 -0600 Subject: [Openstack-mentoring] Neutron subnet with DHCP relay - continued In-Reply-To: References: Message-ID: That helps a lot, thank you! "I use only one network..." This bit seems to go completely against the Neutron segments documentation. When you have access, please let me know if Triple-O is using segments or some other method. I greatly appreciate this, this is a tremendous help. Tom King On Wed, Jul 15, 2020 at 1:07 PM Ruslanas Gžibovskis wrote: > Hi Thomas, > > I have a bit complicated setup from tripleo side :) I use only one network > (only ControlPlane). thanks to Harold, he helped to make it work for me. > > Yes, as written in the tripleo docs for leaf networks, it use the same > neutron network, different subnets. so neutron network is ctlplane (I > think) and have ctlplane-subnet, remote-provision and remote-KI :)) that > generates additional lines in "ip r s" output for routing "foreign" subnets > through correct gw, if you would have isolated networks, by vlans and ports > this would apply for each subnet different gw... I believe you > know/understand that part. > > remote* subnets have dhcp-relay setup by network team... do not ask > details for that. I do not know how to, but can ask :) > > > in undercloud/tripleo i have 2 dhcp servers, one is for introspection, > another for provide/cleanup and deployment process. > > all of those subnets have organization level tagged networks and are > tagged on network devices, but they are untagged on provisioning > interfaces/ports, as in general pxe should be untagged, but some nic's can > do vlan untag on nic/bios level. but who cares!? > > I just did a brief check on your first post, I think I have simmilar setup > to yours :)) I will check in around 12hours :)) more deaply, as will be at > work :))) > > > P.S. sorry for wrong terms, I am bad at naming. > > > On Wed, 15 Jul 2020, 21:13 Thomas King, wrote: > >> Ruslanas, that would be excellent! >> >> I will reply to you directly for details later unless the maillist would >> like the full thread. >> >> Some preliminary questions: >> >> - Do you have a separate physical interface for the segment(s) used >> for your remote subnets? >> The docs state each segment must have a unique physical network name, >> which suggests a separate physical interface for each segment unless I'm >> misunderstanding something. >> - Are your provisioning segments all on the same Neutron network? >> - Are you using tagged switchports or access switchports to your >> Ironic server(s)? >> >> Thanks, >> Tom King >> >> On Wed, Jul 15, 2020 at 12:26 AM Ruslanas Gžibovskis >> wrote: >> >>> I have deployed that with tripleO, but now we are recabling and >>> redeploying it. So once I have it running I can share my configs, just name >>> which you want :) >>> >>> On Tue, 14 Jul 2020 at 18:40, Thomas King wrote: >>> >>>> I have. That's the Triple-O docs and they don't go through the normal >>>> .conf files to explain how it works outside of Triple-O. It has some ideas >>>> but no running configurations. >>>> >>>> Tom King >>>> >>>> On Tue, Jul 14, 2020 at 3:01 AM Ruslanas Gžibovskis >>>> wrote: >>>> >>>>> hi, have you checked: >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html >>>>> ? >>>>> I am following this link. I only have one network, having different >>>>> issues tho ;) >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Thu Jul 16 05:38:55 2020 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 15 Jul 2020 22:38:55 -0700 Subject: [manila] No IRC meeting on 16th July 2020 In-Reply-To: References: Message-ID: Hello Zorillas and Interested Stackers, I clearly missed the overlap our IRC meeting had with the OpenStack Community Meeting (invite/information in the latter part of this email). This is an important one for all of us to attend if we can, and toast this amazing community, of which we are a small part. There were no new agenda items added for this week's meeting, so we'll push any discussion items to the next meeting. In the meantime, please note: - a kernel bug in Ubuntu 18.04 [1][2] currently causes test dsvm nodes on RAX to reboot when running the LVM job. We're currently skipping scenario tests in the LVM job to workaround the issue. This bug has been fixed in a new kernel version, we'll re-enable scenario tests when we don't see the issue occurring on RAX. - Manila's new driver deadline is the week of Jul 27 - Jul 31. Please interact with us on #openstack-manila should you have any concern with this deadline. - please review the specifications, they will need to be merged before the next week's meeting Hope to see you at the community meeting! Goutham Pacha Ravi [1] https://launchpad.net/bugs/1886988 [2] https://launchpad.net/bugs/1886668 ---------- Forwarded message --------- From: Sunny Cai Date: Wed, Jul 8, 2020 at 2:38 PM Subject: July OSF Community Meeting - 10 Years of OpenStack To: Hello everyone, You might have heard that OpenStack is turning 10 this year! On *Thursday*, *July 16 at 8am PT (1500 UTC)*, we will be holding the 10 years of OpenStack virtual celebration in the July OSF community meeting. I have attached the calendar invite for the July OSF community meeting below. Grab your favorite OpenStack swag and bring your favorite drinks of choice to the meeting on July 16. Let’s do a virtual toast to the 10 incredible years! Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV If you have any questions, please let me know. P.S. To add more fun, feel free to try out the virtual background feature in Zoom. The 10 years of OpenStack virtual background is attached below. Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruslanas at lpic.lt Thu Jul 16 08:38:07 2020 From: ruslanas at lpic.lt (=?UTF-8?Q?Ruslanas_G=C5=BEibovskis?=) Date: Thu, 16 Jul 2020 10:38:07 +0200 Subject: [tripleo][centos8][ussuri][horizon] horizon container fails to start Message-ID: Hi all, I have noticed, that horizon container fails to start and some interestin zen_wozniak has apeared [0]. Healthcheck log is empty, but horizon log [1] sais "/usr/bin/python: No such file or directory" and there is no such file or directory :) after sume update it failed. I believe you guys will push update fast enough, as I am still bad at this git and container part.... HOW to fix it now :) on my side? As tripleo will redeploy horizon from images... and will update image. could you please give me a hint where to duck tape it whille it will be pushed to prod? [0] http://paste.openstack.org/show/3jjnsgXfWRxs3o0G6aKH/ [1] http://paste.openstack.org/show/1S66A55cz0UaFUWGxID8/ -- Ruslanas Gžibovskis +370 6030 7030 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Thu Jul 16 12:56:51 2020 From: zigo at debian.org (Thomas Goirand) Date: Thu, 16 Jul 2020 14:56:51 +0200 Subject: Floating IP's for routed networks In-Reply-To: References: <09e8e64c-5e02-45d4-b141-85d2725037d3@infomaniak.com> <8f4abd73-b9e9-73a9-6f3a-60114aed5a61@infomaniak.com> <73504637-23a3-c591-a1cc-c465803abe2b@infomaniak.com> <2127d0f0-03b2-7af7-6381-7a3e0ca72ced@infomaniak.com> Message-ID: <007d6225-12ef-69d7-6c76-45c093909297@debian.org> On 7/15/20 4:09 PM, Rodolfo Alonso Hernandez wrote: > Hi Thomas: > > If I'm not wrong, the goal of this filtering is to remove all those > subnets with service_type='network:routed'. Maybe you can check > implementing an easier query: > SELECT subnets.segment_id AS subnets_segment_id > FROM subnets > WHERE subnets.network_id = %(network_id_1)s AND NOT (EXISTS (SELECT * > FROM subnet_service_types > WHERE subnets.id = subnet_service_types.subnet_id > AND subnet_service_types.service_type = %(service_type_1)s)) > > That will be translated to python as: > > query = test_db.context.session.query(subnet_obj.Subnet.db_model.segment_id) > query = query.filter(subnet_obj.Subnet.db_model.network_id == network_id) > if filtered_service_type: > query = query.filter(~exists().where(and_( > subnet_obj.Subnet.db_model.id == service_type_model.subnet_id, > service_type_model.service_type == filtered_service_type))) > > Can you provide a UTs or a way to check the problem you are experiencing? > > Regards. Hi Rodolfo, Thanks for your help. I tried translating what you wrote above into a working code (ie: fixing a few variables here and there), which I sent as a new PR here: https://review.opendev.org/#/c/741429/ However, printing the result from SQLAlchemy shows that get_subnet_segment_ids() still returns None together with my other 2 subnets, so something must still be wrong. I'm not yet to the point I can write unit tests, just trying the code locally for the moment. Cheers, Thomas Goirand (zigo) From jasowang at redhat.com Thu Jul 16 04:16:26 2020 From: jasowang at redhat.com (Jason Wang) Date: Thu, 16 Jul 2020 12:16:26 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200713232957.GD5955@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> Message-ID: <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> On 2020/7/14 上午7:29, Yan Zhao wrote: > hi folks, > we are defining a device migration compatibility interface that helps upper > layer stack like openstack/ovirt/libvirt to check if two devices are > live migration compatible. > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > e.g. we could use it to check whether > - a src MDEV can migrate to a target MDEV, > - a src VF in SRIOV can migrate to a target VF in SRIOV, > - a src MDEV can migration to a target VF in SRIOV. > (e.g. SIOV/SRIOV backward compatibility case) > > The upper layer stack could use this interface as the last step to check > if one device is able to migrate to another device before triggering a real > live migration procedure. > we are not sure if this interface is of value or help to you. please don't > hesitate to drop your valuable comments. > > > (1) interface definition > The interface is defined in below way: > > __ userspace > /\ \ > / \write > / read \ > ________/__________ ___\|/_____________ > | migration_version | | migration_version |-->check migration > --------------------- --------------------- compatibility > device A device B > > > a device attribute named migration_version is defined under each device's > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). Are you aware of the devlink based device management interface that is proposed upstream? I think it has many advantages over sysfs, do you consider to switch to that? > userspace tools read the migration_version as a string from the source device, > and write it to the migration_version sysfs attribute in the target device. > > The userspace should treat ANY of below conditions as two devices not compatible: > - any one of the two devices does not have a migration_version attribute > - error when reading from migration_version attribute of one device > - error when writing migration_version string of one device to > migration_version attribute of the other device > > The string read from migration_version attribute is defined by device vendor > driver and is completely opaque to the userspace. My understanding is that something opaque to userspace is not the philosophy of Linux. Instead of having a generic API but opaque value, why not do in a vendor specific way like: 1) exposing the device capability in a vendor specific way via sysfs/devlink or other API 2) management read capability in both src and dst and determine whether we can do the migration This is the way we plan to do with vDPA. Thanks > for a Intel vGPU, string format can be defined like > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > for an NVMe VF connecting to a remote storage. it could be > "PCI ID" + "driver version" + "configured remote storage URL" > > for a QAT VF, it may be > "PCI ID" + "driver version" + "supported encryption set". > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > (2) backgrounds > > The reason we hope the migration_version string is opaque to the userspace > is that it is hard to generalize standard comparing fields and comparing > methods for different devices from different vendors. > Though userspace now could still do a simple string compare to check if > two devices are compatible, and result should also be right, it's still > too limited as it excludes the possible candidate whose migration_version > string fails to be equal. > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > with another MDEV with mdev_type_3, aggregator count 1, even their > migration_version strings are not equal. > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > besides that, driver version + configured resources are all elements demanding > to take into account. > > So, we hope leaving the freedom to vendor driver and let it make the final decision > in a simple reading from source side and writing for test in the target side way. > > > we then think the device compatibility issues for live migration with assigned > devices can be divided into two steps: > a. management tools filter out possible migration target devices. > Tags could be created according to info from product specification. > we think openstack/ovirt may have vendor proprietary components to create > those customized tags for each product from each vendor. > e.g. > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > search target vGPU are like: > a tag for compatible parent PCI IDs, > a tag for a range of gvt driver versions, > a tag for a range of mdev type + aggregator count > > for NVMe VF, the tags to search target VF may be like: > a tag for compatible PCI IDs, > a tag for a range of driver versions, > a tag for URL of configured remote storage. > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > device migration compatibility interface to make sure the two devices are > indeed live migration compatible before launching the real live migration > process to start stream copying, src device stopping and target device > resuming. > It is supposed that this step would not bring any performance penalty as > -in kernel it's just a simple string decoding and comparing > -in openstack/ovirt, it could be done by extending current function > check_can_live_migrate_destination, along side claiming target resources.[1] > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > Thanks > Yan > From yan.y.zhao at intel.com Thu Jul 16 08:32:30 2020 From: yan.y.zhao at intel.com (Yan Zhao) Date: Thu, 16 Jul 2020 16:32:30 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> Message-ID: <20200716083230.GA25316@joy-OptiPlex-7040> On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > hi folks, > > we are defining a device migration compatibility interface that helps upper > > layer stack like openstack/ovirt/libvirt to check if two devices are > > live migration compatible. > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > e.g. we could use it to check whether > > - a src MDEV can migrate to a target MDEV, > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > - a src MDEV can migration to a target VF in SRIOV. > > (e.g. SIOV/SRIOV backward compatibility case) > > > > The upper layer stack could use this interface as the last step to check > > if one device is able to migrate to another device before triggering a real > > live migration procedure. > > we are not sure if this interface is of value or help to you. please don't > > hesitate to drop your valuable comments. > > > > > > (1) interface definition > > The interface is defined in below way: > > > > __ userspace > > /\ \ > > / \write > > / read \ > > ________/__________ ___\|/_____________ > > | migration_version | | migration_version |-->check migration > > --------------------- --------------------- compatibility > > device A device B > > > > > > a device attribute named migration_version is defined under each device's > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > Are you aware of the devlink based device management interface that is > proposed upstream? I think it has many advantages over sysfs, do you > consider to switch to that? not familiar with the devlink. will do some research of it. > > > > userspace tools read the migration_version as a string from the source device, > > and write it to the migration_version sysfs attribute in the target device. > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > - any one of the two devices does not have a migration_version attribute > > - error when reading from migration_version attribute of one device > > - error when writing migration_version string of one device to > > migration_version attribute of the other device > > > > The string read from migration_version attribute is defined by device vendor > > driver and is completely opaque to the userspace. > > > My understanding is that something opaque to userspace is not the philosophy but the VFIO live migration in itself is essentially a big opaque stream to userspace. > of Linux. Instead of having a generic API but opaque value, why not do in a > vendor specific way like: > > 1) exposing the device capability in a vendor specific way via sysfs/devlink > or other API > 2) management read capability in both src and dst and determine whether we > can do the migration > > This is the way we plan to do with vDPA. > yes, in another reply, Alex proposed to use an interface in json format. I guess we can define something like { "self" : [ { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_2", "aggregator" : "1", "pv-mode" : "none", } ], "compatible" : [ { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_2", "aggregator" : "1" "pv-mode" : "none", }, { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v1", "mdev_type" : "i915-GVTg_V5_4", "aggregator" : "2" "pv-mode" : "none", }, { "pciid" : "8086591d", "driver" : "i915", "gvt-version" : "v2", "mdev_type" : "i915-GVTg_V5_4", "aggregator" : "2" "pv-mode" : "none, ppgtt, context", } ... ] } But as those fields are mostly vendor specific, the userspace can only do simple string comparing, I guess the list would be very long as it needs to enumerate all possible targets. also, in some fileds like "gvt-version", is there a simple way to express things like v2+? If the userspace can read this interface both in src and target and check whether both src and target are in corresponding compatible list, I think it will work for us. But still, kernel should not rely on userspace's choice, the opaque compatibility string is still required in kernel. No matter whether it would be exposed to userspace as an compatibility checking interface, vendor driver would keep this part of code and embed the string into the migration stream. so exposing it as an interface to be used by libvirt to do a safety check before a real live migration is only about enabling the kernel part of check to happen ahead. Thanks Yan > > > > for a Intel vGPU, string format can be defined like > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > for an NVMe VF connecting to a remote storage. it could be > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > for a QAT VF, it may be > > "PCI ID" + "driver version" + "supported encryption set". > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > (2) backgrounds > > > > The reason we hope the migration_version string is opaque to the userspace > > is that it is hard to generalize standard comparing fields and comparing > > methods for different devices from different vendors. > > Though userspace now could still do a simple string compare to check if > > two devices are compatible, and result should also be right, it's still > > too limited as it excludes the possible candidate whose migration_version > > string fails to be equal. > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible > > with another MDEV with mdev_type_3, aggregator count 1, even their > > migration_version strings are not equal. > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > besides that, driver version + configured resources are all elements demanding > > to take into account. > > > > So, we hope leaving the freedom to vendor driver and let it make the final decision > > in a simple reading from source side and writing for test in the target side way. > > > > > > we then think the device compatibility issues for live migration with assigned > > devices can be divided into two steps: > > a. management tools filter out possible migration target devices. > > Tags could be created according to info from product specification. > > we think openstack/ovirt may have vendor proprietary components to create > > those customized tags for each product from each vendor. > > e.g. > > for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to > > search target vGPU are like: > > a tag for compatible parent PCI IDs, > > a tag for a range of gvt driver versions, > > a tag for a range of mdev type + aggregator count > > > > for NVMe VF, the tags to search target VF may be like: > > a tag for compatible PCI IDs, > > a tag for a range of driver versions, > > a tag for URL of configured remote storage. > > > > b. with the output from step a, openstack/ovirt/libvirt could use our proposed > > device migration compatibility interface to make sure the two devices are > > indeed live migration compatible before launching the real live migration > > process to start stream copying, src device stopping and target device > > resuming. > > It is supposed that this step would not bring any performance penalty as > > -in kernel it's just a simple string decoding and comparing > > -in openstack/ovirt, it could be done by extending current function > > check_can_live_migrate_destination, along side claiming target resources.[1] > > > > > > [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html > > > > Thanks > > Yan > > > From jasowang at redhat.com Thu Jul 16 09:30:41 2020 From: jasowang at redhat.com (Jason Wang) Date: Thu, 16 Jul 2020 17:30:41 +0800 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200716083230.GA25316@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> Message-ID: On 2020/7/16 下午4:32, Yan Zhao wrote: > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: >> On 2020/7/14 上午7:29, Yan Zhao wrote: >>> hi folks, >>> we are defining a device migration compatibility interface that helps upper >>> layer stack like openstack/ovirt/libvirt to check if two devices are >>> live migration compatible. >>> The "devices" here could be MDEVs, physical devices, or hybrid of the two. >>> e.g. we could use it to check whether >>> - a src MDEV can migrate to a target MDEV, >>> - a src VF in SRIOV can migrate to a target VF in SRIOV, >>> - a src MDEV can migration to a target VF in SRIOV. >>> (e.g. SIOV/SRIOV backward compatibility case) >>> >>> The upper layer stack could use this interface as the last step to check >>> if one device is able to migrate to another device before triggering a real >>> live migration procedure. >>> we are not sure if this interface is of value or help to you. please don't >>> hesitate to drop your valuable comments. >>> >>> >>> (1) interface definition >>> The interface is defined in below way: >>> >>> __ userspace >>> /\ \ >>> / \write >>> / read \ >>> ________/__________ ___\|/_____________ >>> | migration_version | | migration_version |-->check migration >>> --------------------- --------------------- compatibility >>> device A device B >>> >>> >>> a device attribute named migration_version is defined under each device's >>> sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). >> >> Are you aware of the devlink based device management interface that is >> proposed upstream? I think it has many advantages over sysfs, do you >> consider to switch to that? > not familiar with the devlink. will do some research of it. >> >>> userspace tools read the migration_version as a string from the source device, >>> and write it to the migration_version sysfs attribute in the target device. >>> >>> The userspace should treat ANY of below conditions as two devices not compatible: >>> - any one of the two devices does not have a migration_version attribute >>> - error when reading from migration_version attribute of one device >>> - error when writing migration_version string of one device to >>> migration_version attribute of the other device >>> >>> The string read from migration_version attribute is defined by device vendor >>> driver and is completely opaque to the userspace. >> >> My understanding is that something opaque to userspace is not the philosophy > but the VFIO live migration in itself is essentially a big opaque stream to userspace. I think it's better not limit to the kernel interface for a specific use case. This is basically the device introspection. > >> of Linux. Instead of having a generic API but opaque value, why not do in a >> vendor specific way like: >> >> 1) exposing the device capability in a vendor specific way via sysfs/devlink >> or other API >> 2) management read capability in both src and dst and determine whether we >> can do the migration >> >> This is the way we plan to do with vDPA. >> > yes, in another reply, Alex proposed to use an interface in json format. > I guess we can define something like > > { "self" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1", > "pv-mode" : "none", > } > ], > "compatible" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v2", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none, ppgtt, context", > } > ... > ] > } This is probably another call for devlink base interface. > > But as those fields are mostly vendor specific, the userspace can > only do simple string comparing, I guess the list would be very long as > it needs to enumerate all possible targets. > also, in some fileds like "gvt-version", is there a simple way to express > things like v2+? That's total vendor specific I think. If "v2+" means it only support a version 2+, we can introduce fields like min_version and max_version. But again, the point is to let such interfaces vendor specific instead of trying to have a generic format. > > If the userspace can read this interface both in src and target and > check whether both src and target are in corresponding compatible list, I > think it will work for us. > > But still, kernel should not rely on userspace's choice, the opaque > compatibility string is still required in kernel. No matter whether > it would be exposed to userspace as an compatibility checking interface, > vendor driver would keep this part of code and embed the string into the > migration stream. Why? Can we simply do: 1) Src support feature A, B, C  (version 1.0) 2) Dst support feature A, B, C, D (version 2.0) 3) only enable feature A, B, C in destination in a version specific way (set version to 1.0) 4) migrate metadata A, B, C > so exposing it as an interface to be used by libvirt to > do a safety check before a real live migration is only about enabling > the kernel part of check to happen ahead. If we've already exposed the capability, there's no need for an extra check like compatibility string. Thanks > > > Thanks > Yan > > >> >>> for a Intel vGPU, string format can be defined like >>> "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". >>> >>> for an NVMe VF connecting to a remote storage. it could be >>> "PCI ID" + "driver version" + "configured remote storage URL" >>> >>> for a QAT VF, it may be >>> "PCI ID" + "driver version" + "supported encryption set". >>> >>> (to avoid namespace confliction from each vendor, we may prefix a driver name to >>> each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) >>> >>> >>> (2) backgrounds >>> >>> The reason we hope the migration_version string is opaque to the userspace >>> is that it is hard to generalize standard comparing fields and comparing >>> methods for different devices from different vendors. >>> Though userspace now could still do a simple string compare to check if >>> two devices are compatible, and result should also be right, it's still >>> too limited as it excludes the possible candidate whose migration_version >>> string fails to be equal. >>> e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible >>> with another MDEV with mdev_type_3, aggregator count 1, even their >>> migration_version strings are not equal. >>> (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). >>> >>> besides that, driver version + configured resources are all elements demanding >>> to take into account. >>> >>> So, we hope leaving the freedom to vendor driver and let it make the final decision >>> in a simple reading from source side and writing for test in the target side way. >>> >>> >>> we then think the device compatibility issues for live migration with assigned >>> devices can be divided into two steps: >>> a. management tools filter out possible migration target devices. >>> Tags could be created according to info from product specification. >>> we think openstack/ovirt may have vendor proprietary components to create >>> those customized tags for each product from each vendor. >>> e.g. >>> for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to >>> search target vGPU are like: >>> a tag for compatible parent PCI IDs, >>> a tag for a range of gvt driver versions, >>> a tag for a range of mdev type + aggregator count >>> >>> for NVMe VF, the tags to search target VF may be like: >>> a tag for compatible PCI IDs, >>> a tag for a range of driver versions, >>> a tag for URL of configured remote storage. >>> >>> b. with the output from step a, openstack/ovirt/libvirt could use our proposed >>> device migration compatibility interface to make sure the two devices are >>> indeed live migration compatible before launching the real live migration >>> process to start stream copying, src device stopping and target device >>> resuming. >>> It is supposed that this step would not bring any performance penalty as >>> -in kernel it's just a simple string decoding and comparing >>> -in openstack/ovirt, it could be done by extending current function >>> check_can_live_migrate_destination, along side claiming target resources.[1] >>> >>> >>> [1] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/libvirt-neutron-sriov-livemigration.html >>> >>> Thanks >>> Yan >>> From arnaud.morin at gmail.com Thu Jul 16 13:31:27 2020 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 16 Jul 2020 13:31:27 +0000 Subject: [largescale-sig] OpenStack DB Archiver Message-ID: <20200716133127.GA31915@sync> Hello large-scalers! TLDR: we opensource a tool to help reducing size of databases. See https://github.com/ovh/osarchiver/ Few months ago, we released a tool, name osarchiver, which we are using on our production environment (at OVH) to help reduce the size of our tables in mariadb (or mysql) In fact, some tables are well know to grow very quickly. We use it, for example, to clean the OpenStack mistral database from old tasks, actions and executions which are older than a year. Another use case could be to archive some data in another table (e.g. with _archived as suffix) if they are 6 months old, and delete this data after 1 year. The source code of this tool is available here: https://github.com/ovh/osarchiver/ We were wondering if some other users would be interested in using the tool, and maybe move it under the opendev governance? Feel free to contact us and/or answer this thread. Cheers, -- Arnaud, Pierre-Samuel and OVH team From sunny at openstack.org Thu Jul 16 14:03:38 2020 From: sunny at openstack.org (Sunny Cai) Date: Thu, 16 Jul 2020 07:03:38 -0700 Subject: 10 Years of OpenStack Celebration - 8:00am PST (1500 UTC) Message-ID: <44F2A352-8155-4558-AF36-7A1BBB71F7AB@openstack.org> Hello everyone, The 10 years of OpenStack virtual celebration is starting in one hour! Join us and many of the original Stackers who helped form the project back in 2010 to celebrate the past 10 years. The meeting starts today at 8:00am PST (1500 UTC). Please see the etherpad for more meeting information: https://etherpad.opendev.org/p/tTP9ilsAaJ2E8vMnm6uV Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Thu Jul 16 15:09:47 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Thu, 16 Jul 2020 17:09:47 +0200 Subject: Kolla Klub Message-ID: Hi Folks, The Kolla Klub is on. Sorry for the late reminder. The meeting url has changed a bit because it cannot be hosted by Mark today. Meeting link: https://meet.google.com/bpx-ymco-cfy Kolla Klub in docs: https://docs.google.com/document/d/1EwQs2GXF-EvJZamEx9vQAOSDB5tCjsDCJyHQN5_4_Sw -yoctozepto From sunny at openstack.org Thu Jul 16 18:39:22 2020 From: sunny at openstack.org (Sunny Cai) Date: Thu, 16 Jul 2020 11:39:22 -0700 Subject: 10 years of OpenStack virtual celebration - recordings Message-ID: Hello everyone, We just had the 10 years of OpenStack virtual celebration with the community members around the world. It was a huge success and thanks to everyone who have joined the community meeting. If you have missed the meeting or what a replay, here you can find the meeting recording and the slide deck: 10 years of OpenStack celebration recording: https://www.youtube.com/watch?v=QYhK0219LIk&feature=youtu.be Slide deck: https://docs.google.com/presentation/d/1bPJYOGVDypcXiNaddoPY9o1Wh-thZeuL-y8PPN-ugtY/edit?usp=sharing Check out the 10 years of OpenStack blog here: https://www.openstack.org/blog/thank-you-to-the-last-decade-hello-to-the-next/ Here I have attached a few screenshots from the virtual celebration. Happy 10 years of OpenStack and take care! Thanks, Sunny Cai OpenStack Foundation sunny at openstack.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.18 AM.png Type: image/png Size: 3111776 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.25 AM.png Type: image/png Size: 3265961 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.32 AM.png Type: image/png Size: 2672013 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.47 AM.png Type: image/png Size: 3397441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-07-16 at 11.01.53 AM.png Type: image/png Size: 3111248 bytes Desc: not available URL: From anilj.mailing at gmail.com Thu Jul 16 21:41:08 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Thu, 16 Jul 2020 14:41:08 -0700 Subject: RabbitMQ consumer connection is refused when trying to read notifications Message-ID: Hi, I followed the video and the steps provided in this video link and the consumer connection is being refused. https://www.openstack.org/videos/summits/denver-2019/nova-versioned-notifications-the-result-of-a-3-year-journey /etc/nova/nova.conf file changes.. [notifications] notify_on_state_change=vm_state default_level=INFO notification_format=both [oslo_messaging_notifications] driver=messagingv2 transport_url=rabbit://guest:guest at 10.30.8.57:5672/ topics=notification retry=-1 The python consume code is as follows (followed the example provided in the video: transport = oslo_messaging.get_notification_transport( cfg.CONF, url='rabbit://guest:guest at 10.30.8.57:5672/') targets = [ oslo_messaging.Target(topic='versioned_notifications'), ] Am I missing any other configuration in any of the services in OpenStack? Let me know if you need any other info. /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu Jul 16 21:45:32 2020 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 16 Jul 2020 16:45:32 -0500 Subject: [TripleO]Documentation to list all options in yaml file and possible values In-Reply-To: References: Message-ID: <0c59330a-8b7e-694e-4918-340ea4031db7@nemebean.com> /me looks sadly at the unfinished https://specs.openstack.org/openstack/tripleo-specs/specs/pike/environment-generator.html One of the goals of that spec was to provide something like this. AFAIK it was never completed (I certainly didn't, because...reasons). There are a few environments using it, but not most and as a result the goal to have every parameter for every service documented was never realized. Disclaimer: I haven't worked on TripleO in years, so it's possible something else has happened since then to address this. On 7/9/20 11:50 AM, Ruslanas Gžibovskis wrote: > Hi all, > > 1) Is there a page or a draft, where all options of TripleO are available? > 2) Is there a page or a draft, where dependencies of each option are listed? > 3) Is there a page or a draft, where all possible values for each option > would be listed? > > -- > Ruslanas Gžibovskis > +370 6030 7030 From pierre-samuel.le-stang at corp.ovh.com Fri Jul 17 08:53:27 2020 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Fri, 17 Jul 2020 10:53:27 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> References: <20200716133127.GA31915@sync> <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> Message-ID: <20200717085327.huq7ztefn7gkec5x@corp.ovh.com> Thomas Goirand wrote on ven. [2020-juil.-17 09:47:22 +0200]: > On 7/16/20 3:31 PM, Arnaud Morin wrote: > > Hello large-scalers! > > > > TLDR: we opensource a tool to help reducing size of databases. > > See https://github.com/ovh/osarchiver/ > > > > > > Few months ago, we released a tool, name osarchiver, which we are using > > on our production environment (at OVH) to help reduce the size of our > > tables in mariadb (or mysql) > > > > In fact, some tables are well know to grow very quickly. > > > > We use it, for example, to clean the OpenStack mistral database from old > > tasks, actions and executions which are older than a year. > > > > Another use case could be to archive some data in another table (e.g. with > > _archived as suffix) if they are 6 months old, and delete this data after > > 1 year. > > > > The source code of this tool is available here: > > https://github.com/ovh/osarchiver/ > > > > We were wondering if some other users would be interested in using the > > tool, and maybe move it under the opendev governance? > > > > Feel free to contact us and/or answer this thread. > > > > Cheers, > > Hi, > > That's very nice, thanks a lot for releasing such a thing. > > However, there's room for improvement if you would like to see your tool > shipped everywhere: > > - please define a requirements.txt > - please get the debian folder away from the main master branch, > especially considering it's using dh_virtualenv !!! > - please tag with a release number > > Also, with what release of OpenStack has this been tested? Is this bound > to a specific release? > > Cheers, > > Thomas Goirand (zigo) Hi Thomas, Thanks for your answer. We will update the repository accordingly. We tested OSArchiver on Newton and Stein releases of OpenStack. By design the tool is agnostic and rely only on the presence of the 'deleted_at' column so for now we do not expect to be bound to a specific release. Best regards, -- PS From thierry at openstack.org Fri Jul 17 13:11:01 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jul 2020 15:11:01 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: Arnaud Morin wrote: > [...] > The source code of this tool is available here: > https://github.com/ovh/osarchiver/ Thanks for sharing this tool! > We were wondering if some other users would be interested in using the > tool, and maybe move it under the opendev governance? I think this is one of those small operational tools that everyone ends up reinventing in their corner, duplicating effort. I support the idea of pushing it upstream, as it will make it easier for others to improve it, but also make the tool more discoverable. In terms of governance, we have several paths we could follow: 1/ we could create a specific project team to maintain this. That sounds completely overkill given the scope and size of the tool, and the fact that it's mostly feature-complete. Project teams are great to produce new "openstack" service components, but this is more peripheral operational tooling that would be released independently. 2/ we could adopt it at the Large Scale SIG, and promote it from there. I feel like this is useful beyond Large scale deployments though, so that sounds suboptimal 3/ during the last Opendev event we discussed reviving the OSops[1] idea: a lightweight area where operators can share the various small tools that they end up creating to help them operate OpenStack deployments. The effort has been dormant for a few years. I personally think the last option is the best, even if we need to figure a few things out before we can land this. I'll start a separate thread on OSops, and depending on how that goes, we'll choose between option 3 or option 2 for osarchiver. [1] https://wiki.openstack.org/wiki/Osops -- Thierry Carrez (ttx) From thierry at openstack.org Fri Jul 17 13:19:55 2020 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 17 Jul 2020 15:19:55 +0200 Subject: [ops] Reviving OSOps ? In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: <2570db1a-874f-a503-bcb7-95b6d4ce3312@openstack.org> Hi everyone, During the last Opendev event we discussed reviving the OSops[1] idea: a lightweight area where operators can share the various small tools that they end up creating to help them operate OpenStack deployments. The effort has been mostly dormant for a few years. We had a recent thread[2] about osarchiver, a new operators helper, and whether it would make sense to push it upstream. I think the best option would be to revive OSops and land it there. Who is interested in helping to revive/maintain this ? If we revive it, I think we should move its repositories away from the catch-all "x" directory under opendev, which was created for projects that were not claimed by anyone during the big migration. If Osops should be considered distinct from OpenStack, then I'd recommend giving it its own opendev top directory, and move existing x/osops-* repositories to osops/*. If we'd like to make OSops a product of the OpenStack community (and have contributions to it be fully recognized as contributions to "OpenStack"), then I'd recommend creating a specific SIG dedicated to this, and move the x/osops-* repositories to openstack/osops-*. [1] https://wiki.openstack.org/wiki/Osops [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-July/015977.html -- Thierry Carrez (ttx) From ignaziocassano at gmail.com Fri Jul 17 16:55:01 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 18:55:01 +0200 Subject: [openstack][octavia] transparent Message-ID: Hello all, I have some end users who want to receive on their load balanced web servers the client ip address for acl. They also want the https connection is terminated on web servers and not on load balancer. Can I solve with octavia ? I read haproxy can act as transparent only when it is the default router of backends. In our use case the default router is not the load balancer. Any help, please? Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Fri Jul 17 17:17:46 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 17 Jul 2020 10:17:46 -0700 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Hi Ignazio, Currently the amphora driver does not support passing the client source IP directly to the backend member server. However there are a few ways to accomplish this using the amphora driver: 1. Use the proxy protocol for the pool. 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For header. To use the PROXY protocol you would set up the load balancer like this: 1. Create the load balancer. 2. Create the listener using HTTPS pass through, so either the "HTTPS" or "TCP" protocol. 3. Create the pool using the "PROXY" protocol option. 4. Add your members and health manager as you normally do. Then, on the web servers enable PROXY protocol. On apache this is via the mod_remoteip module and the RemoteIPProxyProtocol directive. See: https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol On nginx it is enabled with the "proxy_protocol" directive. See: https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ Pretty much every web server has support for it. Michael On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano wrote: > > Hello all, I have some end users who want to receive on their load balanced web servers the client ip address for acl. > They also want the https connection is terminated on web servers and not on load balancer. > Can I solve with octavia ? > I read haproxy can act as transparent only when it is the default router of backends. > In our use case the default router is not the load balancer. > Any help, please? > Ignazio > From fungi at yuggoth.org Fri Jul 17 17:32:20 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jul 2020 17:32:20 +0000 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> On 2020-07-17 18:55:01 +0200 (+0200), Ignazio Cassano wrote: > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. They also want > the https connection is terminated on web servers and not on load > balancer. Can I solve with octavia ? I read haproxy can act as > transparent only when it is the default router of backends. In our > use case the default router is not the load balancer. Any help, > please? You'll be hard pressed to find any network load balancer which can satisfy this combination of requirements without also requiring some cooperation from the gateway. The ways you typically get the client IP addresses to your servers are one of: 1. Use the load balancer as the default router for the servers so that it doesn't need to alter the IP addresses of the packets (layer 3 forwarding). 2. Terminate SSL/TLS on the load balancer so that it can insert X-Forwarded-For headers into the HTTP requests, and then optionally re-encrypt when sending along to the servers (layer 7 forwarding). 3. A "direct server return" configuration where the load balancer masquerades as the clients and only handles the inbound packets to the servers, while the outbound replies from the servers go directly to the Internet through their default gateway (asymmetric layer 3 forwarding with destination NAT). This is the only option which meets the list of requirements you posed and it's exceptionally messy to implement, since you can't rely on state tracking either on the load balancer or the default gateway (each of them only sees half of the connection). This can also thoroughly confuse your packet filtering depending on where in your network it's applied. A bit of quick searching doesn't turn up any available amphorae for Octavia which support DSR, but even if there were I expect you'd face challenges adapting Neutron and security groups to handle it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Jul 17 17:55:12 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 19:55:12 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Many thanks, Michael. Ignazio Il Ven 17 Lug 2020, 19:17 Michael Johnson ha scritto: > Hi Ignazio, > > Currently the amphora driver does not support passing the client > source IP directly to the backend member server. > > However there are a few ways to accomplish this using the amphora driver: > 1. Use the proxy protocol for the pool. > 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For > header. > > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. > > Michael > > On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano > wrote: > > > > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. > > They also want the https connection is terminated on web servers and not > on load balancer. > > Can I solve with octavia ? > > I read haproxy can act as transparent only when it is the default router > of backends. > > In our use case the default router is not the load balancer. > > Any help, please? > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jul 17 18:20:10 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:20:10 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: Hello Michael, I forgot to ask if the configuration you suggested can support acl for clients ip address. Ignazio Il Ven 17 Lug 2020, 19:17 Michael Johnson ha scritto: > Hi Ignazio, > > Currently the amphora driver does not support passing the client > source IP directly to the backend member server. > > However there are a few ways to accomplish this using the amphora driver: > 1. Use the proxy protocol for the pool. > 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For > header. > > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. > > Michael > > On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano > wrote: > > > > Hello all, I have some end users who want to receive on their load > balanced web servers the client ip address for acl. > > They also want the https connection is terminated on web servers and not > on load balancer. > > Can I solve with octavia ? > > I read haproxy can act as transparent only when it is the default router > of backends. > > In our use case the default router is not the load balancer. > > Any help, please? > > Ignazio > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Jul 17 18:22:06 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:22:06 +0200 Subject: [openstack][octavia] transparent In-Reply-To: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> References: <20200717173219.hv2vahdznwyjf3k7@yuggoth.org> Message-ID: Many thanks, Jeremy Il Ven 17 Lug 2020, 19:42 Jeremy Stanley ha scritto: > On 2020-07-17 18:55:01 +0200 (+0200), Ignazio Cassano wrote: > > Hello all, I have some end users who want to receive on their load > > balanced web servers the client ip address for acl. They also want > > the https connection is terminated on web servers and not on load > > balancer. Can I solve with octavia ? I read haproxy can act as > > transparent only when it is the default router of backends. In our > > use case the default router is not the load balancer. Any help, > > please? > > You'll be hard pressed to find any network load balancer which can > satisfy this combination of requirements without also requiring some > cooperation from the gateway. The ways you typically get the client > IP addresses to your servers are one of: > > 1. Use the load balancer as the default router for the servers so > that it doesn't need to alter the IP addresses of the packets (layer > 3 forwarding). > > 2. Terminate SSL/TLS on the load balancer so that it can insert > X-Forwarded-For headers into the HTTP requests, and then optionally > re-encrypt when sending along to the servers (layer 7 forwarding). > > 3. A "direct server return" configuration where the load balancer > masquerades as the clients and only handles the inbound packets to > the servers, while the outbound replies from the servers go directly > to the Internet through their default gateway (asymmetric layer 3 > forwarding with destination NAT). This is the only option which > meets the list of requirements you posed and it's exceptionally > messy to implement, since you can't rely on state tracking either on > the load balancer or the default gateway (each of them only sees > half of the connection). This can also thoroughly confuse your > packet filtering depending on where in your network it's applied. > > A bit of quick searching doesn't turn up any available amphorae for > Octavia which support DSR, but even if there were I expect you'd > face challenges adapting Neutron and security groups to handle it. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Jul 17 18:23:37 2020 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 17 Jul 2020 18:23:37 +0000 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: <20200717182337.i7a2lwpatempbz2x@yuggoth.org> On 2020-07-17 17:17 +0000 (+0000), Michael Johnson write: [...] > To use the PROXY protocol you would set up the load balancer like this: > 1. Create the load balancer. > 2. Create the listener using HTTPS pass through, so either the "HTTPS" > or "TCP" protocol. > 3. Create the pool using the "PROXY" protocol option. > 4. Add your members and health manager as you normally do. > > Then, on the web servers enable PROXY protocol. > On apache this is via the mod_remoteip module and the > RemoteIPProxyProtocol directive. See: > > https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol > On nginx it is enabled with the "proxy_protocol" directive. See: > > https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ > > Pretty much every web server has support for it. [...] Neat! Somehow this is the first I've heard of it. An attempt at a formal specification seems to be published at http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt but I'm not finding any corresponding IETF RFC draft. I agree it looks like a viable solution to the question posed (so long as the LB and servers have support for this custom protocol/encapsulation). Way less problematic than DSR, just unfortunately handled as a de facto standard from what I can see, but looks like https://tools.ietf.org/id/draft-schwartz-tls-lb-00.html touches on ways to hopefully provide a more extensible solution in the future. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ignaziocassano at gmail.com Fri Jul 17 18:25:49 2020 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 17 Jul 2020 20:25:49 +0200 Subject: [openstack][octavia] transparent In-Reply-To: References: Message-ID: I mean acl on load balancer not on web servers..... Il Ven 17 Lug 2020, 20:20 Ignazio Cassano ha scritto: > Hello Michael, I forgot to ask if the configuration you suggested can > support acl for clients ip address. > Ignazio > > Il Ven 17 Lug 2020, 19:17 Michael Johnson ha > scritto: > >> Hi Ignazio, >> >> Currently the amphora driver does not support passing the client >> source IP directly to the backend member server. >> >> However there are a few ways to accomplish this using the amphora driver: >> 1. Use the proxy protocol for the pool. >> 2. Terminate the HTTPS on the load balancer and add the X-Forwarded-For >> header. >> >> To use the PROXY protocol you would set up the load balancer like this: >> 1. Create the load balancer. >> 2. Create the listener using HTTPS pass through, so either the "HTTPS" >> or "TCP" protocol. >> 3. Create the pool using the "PROXY" protocol option. >> 4. Add your members and health manager as you normally do. >> >> Then, on the web servers enable PROXY protocol. >> On apache this is via the mod_remoteip module and the >> RemoteIPProxyProtocol directive. See: >> >> https://httpd.apache.org/docs/2.4/mod/mod_remoteip.html#remoteipproxyprotocol >> On nginx it is enabled with the "proxy_protocol" directive. See: >> >> https://docs.nginx.com/nginx/admin-guide/load-balancer/using-proxy-protocol/ >> >> Pretty much every web server has support for it. >> >> Michael >> >> On Fri, Jul 17, 2020 at 10:01 AM Ignazio Cassano >> wrote: >> > >> > Hello all, I have some end users who want to receive on their load >> balanced web servers the client ip address for acl. >> > They also want the https connection is terminated on web servers and >> not on load balancer. >> > Can I solve with octavia ? >> > I read haproxy can act as transparent only when it is the default >> router of backends. >> > In our use case the default router is not the load balancer. >> > Any help, please? >> > Ignazio >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at goirand.fr Fri Jul 17 07:47:22 2020 From: thomas at goirand.fr (Thomas Goirand) Date: Fri, 17 Jul 2020 09:47:22 +0200 Subject: [largescale-sig] OpenStack DB Archiver In-Reply-To: <20200716133127.GA31915@sync> References: <20200716133127.GA31915@sync> Message-ID: <21b85e64-5cbf-d6e1-a739-50b74d9585a2@goirand.fr> On 7/16/20 3:31 PM, Arnaud Morin wrote: > Hello large-scalers! > > TLDR: we opensource a tool to help reducing size of databases. > See https://github.com/ovh/osarchiver/ > > > Few months ago, we released a tool, name osarchiver, which we are using > on our production environment (at OVH) to help reduce the size of our > tables in mariadb (or mysql) > > In fact, some tables are well know to grow very quickly. > > We use it, for example, to clean the OpenStack mistral database from old > tasks, actions and executions which are older than a year. > > Another use case could be to archive some data in another table (e.g. with > _archived as suffix) if they are 6 months old, and delete this data after > 1 year. > > The source code of this tool is available here: > https://github.com/ovh/osarchiver/ > > We were wondering if some other users would be interested in using the > tool, and maybe move it under the opendev governance? > > Feel free to contact us and/or answer this thread. > > Cheers, Hi, That's very nice, thanks a lot for releasing such a thing. However, there's room for improvement if you would like to see your tool shipped everywhere: - please define a requirements.txt - please get the debian folder away from the main master branch, especially considering it's using dh_virtualenv !!! - please tag with a release number Also, with what release of OpenStack has this been tested? Is this bound to a specific release? Cheers, Thomas Goirand (zigo) From alex.williamson at redhat.com Fri Jul 17 14:59:35 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 08:59:35 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200715082040.GA13136@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> Message-ID: <20200717085935.224ffd46@x1.home> On Wed, 15 Jul 2020 16:20:41 +0800 Yan Zhao wrote: > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > - error when reading from migration_version attribute of one device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. > > > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > Yes, I agree kernel could expose such sysfs interface to educate > openstack how to filter out devices. But I still think the proposed > migration_version (or rename to migration_compatibility) interface is > still required for libvirt to do double check. > > In the following scenario: > 1. openstack chooses the target device by reading sysfs interface (of json > format) of the source device. And Openstack are now pretty sure the two > devices are migration compatible. > 2. openstack asks libvirt to create the target VM with the target device > and start live migration. > 3. libvirt now receives the request. so it now has two choices: > (1) create the target VM & target device and start live migration directly > (2) double check if the target device is compatible with the source > device before doing the remaining tasks. > > Because the factors to determine whether two devices are live migration > compatible are complicated and may be dynamically changing, (e.g. driver > upgrade or configuration changes), and also because libvirt should not > totally rely on the input from openstack, I think the cost for libvirt is > relatively lower if it chooses to go (2) than (1). At least it has no > need to cancel migration and destroy the VM if it knows it earlier. > > So, it means the kernel may need to expose two parallel interfaces: > (1) with json format, enumerating all possible fields and comparing > methods, so as to indicate openstack how to find a matching target device > (2) an opaque driver defined string, requiring write and test in target, > which is used by libvirt to make sure device compatibility, rather than > rely on the input accurateness from openstack or rely on kernel driver > implementing the compatibility detection immediately after migration > start. > > Does it make sense? No, libvirt is not responsible for the success or failure of the migration, it's the vendor driver's responsibility to encode compatibility information early in the migration stream and error should the incoming device prove to be incompatible. It's not libvirt's job to second guess the management engine and I would not support a duplicate interface only for that purpose. Thanks, Alex From alex.williamson at redhat.com Fri Jul 17 15:18:54 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 09:18:54 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> Message-ID: <20200717091854.72013c91@x1.home> On Wed, 15 Jul 2020 15:37:19 +0800 Alex Xu wrote: > Alex Williamson 于2020年7月15日周三 上午5:00写道: > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that > > helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > > are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of > > the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step to > > check > > > > > > if one device is able to migrate to another device before > > triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. > > please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __ userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > > ________/__________ ___\|/_____________ > > > > > > | migration_version | | migration_version |-->check migration > > > > > > --------------------- --------------------- compatibility > > > > > > device A device B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > > device's > > > > > > sysfs node. e.g. > > (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the > > source device, > > > > > > and write it to the migration_version sysfs attribute in the > > target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices > > not compatible: > > > > > > - any one of the two devices does not have a migration_version > > attribute > > > > > > - error when reading from migration_version attribute of one device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by > > device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > > "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > > driver name to > > > > > > each migration_version string. e.g. > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver > > version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "compatibility string" might look like represented as json: > > > > { > > "device_api": "vfio-pci", > > "vendor": "vendor-driver-name", > > "version": { > > "major": 0, > > "minor": 1 > > }, > > > > The OpenStack Placement service doesn't support to filtering the target > host by the semver syntax, altough we can code this filtering logic inside > scheduler filtering by python code. Basically, placement only supports > filtering the host by traits (it is same thing with labels, tags). The nova > scheduler will call the placement service to filter the hosts first, then > go through all the scheduler filters. That would be great if the placement > service can filter out more hosts which isn't compatible first, and then it > is better. > > > > "vfio-pci": { // Based on above device_api > > "vendor": 0x1234, // Values for the exposed device > > "device": 0x5678, > > // Possibly further parameters for a more specific match > > }, > > > > OpenStack already based on vendor and device id to separate the devices > into the different resource pool, then the scheduler based on that to filer > the hosts, so I think it needn't be the part of this compatibility string. This is the part of the string that actually says what the resulting device is, so it's a rather fundamental part of the description. This is where we'd determine that a physical to mdev migration is possible or that different mdev types result in the same guest PCI device, possibly with attributes set as specified later in the output. > > "mdev_attrs": [ > > { "attribute0": "VALUE" } > > ] > > } > > > > Are you thinking that we might allow the vendor to include a vendor > > specific array where we'd simply require that both sides have matching > > fields and values? ie. That's what I'm defining in the below vendor_fields, the above mdev_attrs would be specifying attributes of the device that must be set in order to create the device with the compatibility described. For example if we're describing compatibility for type foo-1, which is a base type that can be equivalent to type foo-3 if type foo-1 is created with aggregation=3, this is where that would be defined. Thanks, Alex > > "vendor_fields": [ > > { "unknown_field0": "unknown_value0" }, > > { "unknown_field1": "unknown_value1" }, > > ] > > > > Since the placement support traits (labels, tags), so the placement just to > matching those fields, so it isn't problem of openstack, since openstack > needn't to know the meaning of those fields. But the traits is just a > label, it isn't key-value format. But also if we have to, we can code this > scheduler filter by python code. But the same thing as above, the invalid > host can't be filtered out in the first step placement service filtering. > > > > We could certainly make that part of the spec, but I can't really > > figure the value of it other than to severely restrict compatibility, > > which the vendor could already do via the version.major value. Maybe > > they'd want to put a build timestamp, random uuid, or source sha1 into > > such a field to make absolutely certain compatibility is only determined > > between identical builds? Thanks, > > > > Alex > > > > From alex.williamson at redhat.com Fri Jul 17 16:12:58 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 10:12:58 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200716083230.GA25316@joy-OptiPlex-7040> References: <20200713232957.GD5955@joy-OptiPlex-7040> <9bfa8700-91f5-ebb4-3977-6321f0487a63@redhat.com> <20200716083230.GA25316@joy-OptiPlex-7040> Message-ID: <20200717101258.65555978@x1.home> On Thu, 16 Jul 2020 16:32:30 +0800 Yan Zhao wrote: > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > Are you aware of the devlink based device management interface that is > > proposed upstream? I think it has many advantages over sysfs, do you > > consider to switch to that? Advantages, such as? > not familiar with the devlink. will do some research of it. > > > > > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > > > > My understanding is that something opaque to userspace is not the philosophy > > but the VFIO live migration in itself is essentially a big opaque stream to userspace. > > > of Linux. Instead of having a generic API but opaque value, why not do in a > > vendor specific way like: > > > > 1) exposing the device capability in a vendor specific way via sysfs/devlink > > or other API > > 2) management read capability in both src and dst and determine whether we > > can do the migration > > > > This is the way we plan to do with vDPA. > > > yes, in another reply, Alex proposed to use an interface in json format. > I guess we can define something like > > { "self" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1", > "pv-mode" : "none", > } > ], > "compatible" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v2", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none, ppgtt, context", > } > ... > ] > } > > But as those fields are mostly vendor specific, the userspace can > only do simple string comparing, I guess the list would be very long as > it needs to enumerate all possible targets. This ignores so much of what I tried to achieve in my example :( > also, in some fileds like "gvt-version", is there a simple way to express > things like v2+? That's not a reasonable thing to express anyway, how can you be certain that v3 won't break compatibility with v2? Sean proposed a versioning scheme that accounts for this, using an x.y.z version expressing the major, minor, and bugfix versions, where there is no compatibility across major versions, minor versions have forward compatibility (ex. 1 -> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some degree of internal improvement that is not visible to the user in terms of features or compatibility, but provides a basis for preferring equally compatible candidates. > If the userspace can read this interface both in src and target and > check whether both src and target are in corresponding compatible list, I > think it will work for us. > > But still, kernel should not rely on userspace's choice, the opaque > compatibility string is still required in kernel. No matter whether > it would be exposed to userspace as an compatibility checking interface, > vendor driver would keep this part of code and embed the string into the > migration stream. so exposing it as an interface to be used by libvirt to > do a safety check before a real live migration is only about enabling > the kernel part of check to happen ahead. As you indicate, the vendor driver is responsible for checking version information embedded within the migration stream. Therefore a migration should fail early if the devices are incompatible. Is it really libvirt's place to second guess what it has been directed to do? Why would we even proceed to design a user parse-able version interface if we still have a dependency on an opaque interface? Thanks, Alex From dgilbert at redhat.com Fri Jul 17 18:03:44 2020 From: dgilbert at redhat.com (Dr. David Alan Gilbert) Date: Fri, 17 Jul 2020 19:03:44 +0100 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717085935.224ffd46@x1.home> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> <20200717085935.224ffd46@x1.home> Message-ID: <20200717180344.GD3294@work-vm> * Alex Williamson (alex.williamson at redhat.com) wrote: > On Wed, 15 Jul 2020 16:20:41 +0800 > Yan Zhao wrote: > > > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > > "Dr. David Alan Gilbert" wrote: > > > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > hi folks, > > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > > live migration compatible. > > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > > e.g. we could use it to check whether > > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > > live migration procedure. > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > > The interface is defined in below way: > > > > > > > > > > > > > > __ userspace > > > > > > > /\ \ > > > > > > > / \write > > > > > > > / read \ > > > > > > > ________/__________ ___\|/_____________ > > > > > > > | migration_version | | migration_version |-->check migration > > > > > > > --------------------- --------------------- compatibility > > > > > > > device A device B > > > > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > > - error when reading from migration_version attribute of one device > > > > > > > - error when writing migration_version string of one device to > > > > > > > migration_version attribute of the other device > > > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > > driver and is completely opaque to the userspace. > > > > > > > for a Intel vGPU, string format can be defined like > > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > > > for a QAT VF, it may be > > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > > the contents of that opaque string. The point is that its contents > > > > > are defined by the vendor driver to describe the device, driver version, > > > > > and possibly metadata about the configuration of the device. One > > > > > instance of a device might generate a different string from another. > > > > > The string that a device produces is not necessarily the only string > > > > > the vendor driver will accept, for example the driver might support > > > > > backwards compatible migrations. > > > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > > this string; I'd expect to have an ID and version that are human > > > > readable, maybe a device ID/name that's human interpretable and then a > > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > > > I'm thinking that we want to be able to report problems and include the > > > > string and the user to be able to easily identify the device that was > > > > complaining and notice a difference in versions, and perhaps also use > > > > it in compatibility patterns to find compatible hosts; but that does > > > > get tricky when it's a 'ask the device if it's compatible'. > > > > > > In the reply I just sent to Dan, I gave this example of what a > > > "compatibility string" might look like represented as json: > > > > > > { > > > "device_api": "vfio-pci", > > > "vendor": "vendor-driver-name", > > > "version": { > > > "major": 0, > > > "minor": 1 > > > }, > > > "vfio-pci": { // Based on above device_api > > > "vendor": 0x1234, // Values for the exposed device > > > "device": 0x5678, > > > // Possibly further parameters for a more specific match > > > }, > > > "mdev_attrs": [ > > > { "attribute0": "VALUE" } > > > ] > > > } > > > > > > Are you thinking that we might allow the vendor to include a vendor > > > specific array where we'd simply require that both sides have matching > > > fields and values? ie. > > > > > > "vendor_fields": [ > > > { "unknown_field0": "unknown_value0" }, > > > { "unknown_field1": "unknown_value1" }, > > > ] > > > > > > We could certainly make that part of the spec, but I can't really > > > figure the value of it other than to severely restrict compatibility, > > > which the vendor could already do via the version.major value. Maybe > > > they'd want to put a build timestamp, random uuid, or source sha1 into > > > such a field to make absolutely certain compatibility is only determined > > > between identical builds? Thanks, > > > > > Yes, I agree kernel could expose such sysfs interface to educate > > openstack how to filter out devices. But I still think the proposed > > migration_version (or rename to migration_compatibility) interface is > > still required for libvirt to do double check. > > > > In the following scenario: > > 1. openstack chooses the target device by reading sysfs interface (of json > > format) of the source device. And Openstack are now pretty sure the two > > devices are migration compatible. > > 2. openstack asks libvirt to create the target VM with the target device > > and start live migration. > > 3. libvirt now receives the request. so it now has two choices: > > (1) create the target VM & target device and start live migration directly > > (2) double check if the target device is compatible with the source > > device before doing the remaining tasks. > > > > Because the factors to determine whether two devices are live migration > > compatible are complicated and may be dynamically changing, (e.g. driver > > upgrade or configuration changes), and also because libvirt should not > > totally rely on the input from openstack, I think the cost for libvirt is > > relatively lower if it chooses to go (2) than (1). At least it has no > > need to cancel migration and destroy the VM if it knows it earlier. > > > > So, it means the kernel may need to expose two parallel interfaces: > > (1) with json format, enumerating all possible fields and comparing > > methods, so as to indicate openstack how to find a matching target device > > (2) an opaque driver defined string, requiring write and test in target, > > which is used by libvirt to make sure device compatibility, rather than > > rely on the input accurateness from openstack or rely on kernel driver > > implementing the compatibility detection immediately after migration > > start. > > > > Does it make sense? > > No, libvirt is not responsible for the success or failure of the > migration, it's the vendor driver's responsibility to encode > compatibility information early in the migration stream and error > should the incoming device prove to be incompatible. It's not > libvirt's job to second guess the management engine and I would not > support a duplicate interface only for that purpose. Thanks, libvirt does try to enforce it for other things; trying to stop a bad migration from starting. Dave > Alex -- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK From alex.williamson at redhat.com Fri Jul 17 18:30:26 2020 From: alex.williamson at redhat.com (Alex Williamson) Date: Fri, 17 Jul 2020 12:30:26 -0600 Subject: device compatibility interface for live migration with assigned devices In-Reply-To: <20200717180344.GD3294@work-vm> References: <20200713232957.GD5955@joy-OptiPlex-7040> <20200714102129.GD25187@redhat.com> <20200714101616.5d3a9e75@x1.home> <20200714171946.GL2728@work-vm> <20200714145948.17b95eb3@x1.home> <20200715082040.GA13136@joy-OptiPlex-7040> <20200717085935.224ffd46@x1.home> <20200717180344.GD3294@work-vm> Message-ID: <20200717123026.6ab26442@x1.home> On Fri, 17 Jul 2020 19:03:44 +0100 "Dr. David Alan Gilbert" wrote: > * Alex Williamson (alex.williamson at redhat.com) wrote: > > On Wed, 15 Jul 2020 16:20:41 +0800 > > Yan Zhao wrote: > > > > > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > > > On Tue, 14 Jul 2020 18:19:46 +0100 > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > * Alex Williamson (alex.williamson at redhat.com) wrote: > > > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > > > hi folks, > > > > > > > > we are defining a device migration compatibility interface that helps upper > > > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > > > > live migration compatible. > > > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > > > > > > e.g. we could use it to check whether > > > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > > > > > The upper layer stack could use this interface as the last step to check > > > > > > > > if one device is able to migrate to another device before triggering a real > > > > > > > > live migration procedure. > > > > > > > > we are not sure if this interface is of value or help to you. please don't > > > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > > > The interface is defined in below way: > > > > > > > > > > > > > > > > __ userspace > > > > > > > > /\ \ > > > > > > > > / \write > > > > > > > > / read \ > > > > > > > > ________/__________ ___\|/_____________ > > > > > > > > | migration_version | | migration_version |-->check migration > > > > > > > > --------------------- --------------------- compatibility > > > > > > > > device A device B > > > > > > > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each device's > > > > > > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > > > userspace tools read the migration_version as a string from the source device, > > > > > > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > > > > > > - any one of the two devices does not have a migration_version attribute > > > > > > > > - error when reading from migration_version attribute of one device > > > > > > > > - error when writing migration_version string of one device to > > > > > > > > migration_version attribute of the other device > > > > > > > > > > > > > > > > The string read from migration_version attribute is defined by device vendor > > > > > > > > driver and is completely opaque to the userspace. > > > > > > > > for a Intel vGPU, string format can be defined like > > > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". > > > > > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > > > > > for a QAT VF, it may be > > > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a driver name to > > > > > > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > > > the contents of that opaque string. The point is that its contents > > > > > > are defined by the vendor driver to describe the device, driver version, > > > > > > and possibly metadata about the configuration of the device. One > > > > > > instance of a device might generate a different string from another. > > > > > > The string that a device produces is not necessarily the only string > > > > > > the vendor driver will accept, for example the driver might support > > > > > > backwards compatible migrations. > > > > > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > > > this string; I'd expect to have an ID and version that are human > > > > > readable, maybe a device ID/name that's human interpretable and then a > > > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > > > > > I'm thinking that we want to be able to report problems and include the > > > > > string and the user to be able to easily identify the device that was > > > > > complaining and notice a difference in versions, and perhaps also use > > > > > it in compatibility patterns to find compatible hosts; but that does > > > > > get tricky when it's a 'ask the device if it's compatible'. > > > > > > > > In the reply I just sent to Dan, I gave this example of what a > > > > "compatibility string" might look like represented as json: > > > > > > > > { > > > > "device_api": "vfio-pci", > > > > "vendor": "vendor-driver-name", > > > > "version": { > > > > "major": 0, > > > > "minor": 1 > > > > }, > > > > "vfio-pci": { // Based on above device_api > > > > "vendor": 0x1234, // Values for the exposed device > > > > "device": 0x5678, > > > > // Possibly further parameters for a more specific match > > > > }, > > > > "mdev_attrs": [ > > > > { "attribute0": "VALUE" } > > > > ] > > > > } > > > > > > > > Are you thinking that we might allow the vendor to include a vendor > > > > specific array where we'd simply require that both sides have matching > > > > fields and values? ie. > > > > > > > > "vendor_fields": [ > > > > { "unknown_field0": "unknown_value0" }, > > > > { "unknown_field1": "unknown_value1" }, > > > > ] > > > > > > > > We could certainly make that part of the spec, but I can't really > > > > figure the value of it other than to severely restrict compatibility, > > > > which the vendor could already do via the version.major value. Maybe > > > > they'd want to put a build timestamp, random uuid, or source sha1 into > > > > such a field to make absolutely certain compatibility is only determined > > > > between identical builds? Thanks, > > > > > > > Yes, I agree kernel could expose such sysfs interface to educate > > > openstack how to filter out devices. But I still think the proposed > > > migration_version (or rename to migration_compatibility) interface is > > > still required for libvirt to do double check. > > > > > > In the following scenario: > > > 1. openstack chooses the target device by reading sysfs interface (of json > > > format) of the source device. And Openstack are now pretty sure the two > > > devices are migration compatible. > > > 2. openstack asks libvirt to create the target VM with the target device > > > and start live migration. > > > 3. libvirt now receives the request. so it now has two choices: > > > (1) create the target VM & target device and start live migration directly > > > (2) double check if the target device is compatible with the source > > > device before doing the remaining tasks. > > > > > > Because the factors to determine whether two devices are live migration > > > compatible are complicated and may be dynamically changing, (e.g. driver > > > upgrade or configuration changes), and also because libvirt should not > > > totally rely on the input from openstack, I think the cost for libvirt is > > > relatively lower if it chooses to go (2) than (1). At least it has no > > > need to cancel migration and destroy the VM if it knows it earlier. > > > > > > So, it means the kernel may need to expose two parallel interfaces: > > > (1) with json format, enumerating all possible fields and comparing > > > methods, so as to indicate openstack how to find a matching target device > > > (2) an opaque driver defined string, requiring write and test in target, > > > which is used by libvirt to make sure device compatibility, rather than > > > rely on the input accurateness from openstack or rely on kernel driver > > > implementing the compatibility detection immediately after migration > > > start. > > > > > > Does it make sense? > > > > No, libvirt is not responsible for the success or failure of the > > migration, it's the vendor driver's responsibility to encode > > compatibility information early in the migration stream and error > > should the incoming device prove to be incompatible. It's not > > libvirt's job to second guess the management engine and I would not > > support a duplicate interface only for that purpose. Thanks, > > libvirt does try to enforce it for other things; trying to stop a bad > migration from starting. Even if libvirt did want to verify why would we want to support a separate opaque interface for that purpose versus a parse-able interface? If we get different results, we've failed. Thanks, Alex From johnsomor at gmail.com Fri Jul 17 19:12:20 2020 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 17 Jul 2020 12:12:20 -0700 Subject: [octavia] Proposal to deprecate the amphora spares pool Message-ID: Back at the Victoria PTG the Octavia team discussed deprecating the spares pool capability of the amphora driver[1]. This would follow the standard OpenStack deprecation process[2]. There are a number of reasons this was proposed: 1. It adds a lot of complexity to the code. 2. It can't be used with Active/Standby load balancers due to server group (anti-affinity) limitations in Nova. 3. It provides only 15-30 seconds of speedup when provisioning a new load balancer on production clouds. 4. It makes supporting Octavia availability zones awkward as we have to boot spares instances in each AZ. 5. It can be confusing for people when it is enabled as there are always extra amphora running and being automatically recreated. Due to these reasons a patch has been proposed to deprecate spares pool support in the amphora driver: https://review.opendev.org/741686 Please comment on that patch and/or join the weekly Octavia IRC meeting if you have any concerns with this deprecation plan. Michael [1] https://etherpad.opendev.org/p/octavia-virtual-V-ptg [2] https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html From sean.mcginnis at gmx.com Fri Jul 17 22:43:33 2020 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Fri, 17 Jul 2020 17:43:33 -0500 Subject: [PTL][Stable] Releases proposed for stable/stein Message-ID: <821d2cfa-1aeb-d532-0b56-db3918ab0215@gmx.com> /me takes off release team hat and puts on stable team hat Hey everyone, To help out with stable releases, I've run a script to propose releases for any deliverables in stable/stein that had commits merged but not released yet. This is just to try to help make sure those fixes get out downstream, and to help ease the crunch that we inevitably have near the time that stable/stein goes into Extended Maintenance mode (this coming November). These are not driven by the release team, and they are not required. They are merely a convenience to help out the teams. If there is a patch for any deliverables owned by your team and you are good with the release, please leave a +1 and we will process it. Any patches with a -1, or anything not acknowledged by the end of next week, will just be abandoned. Of course, stable releases can be proposed by the team whenever they are ready. Again, this is not a release team activity. This may or may not be done regularly. I just had some time and an itch to do it. Patches can be found here: https://review.opendev.org/#/q/topic:stein-stable+(status:open+OR+status:merged) Thanks! Sean From anilj.mailing at gmail.com Sat Jul 18 06:00:19 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 17 Jul 2020 23:00:19 -0700 Subject: RabbitMQ consumer connection is refused when trying to read notifications In-Reply-To: References: Message-ID: Hi, Can someone please provide some clue on this issue? /anil. On Thu, Jul 16, 2020 at 2:41 PM Anil Jangam wrote: > Hi, > > I followed the video and the steps provided in this video link and the > consumer connection is being refused. > > https://www.openstack.org/videos/summits/denver-2019/nova-versioned-notifications-the-result-of-a-3-year-journey > > /etc/nova/nova.conf file changes.. > [notifications] > notify_on_state_change=vm_state > default_level=INFO > notification_format=both > > [oslo_messaging_notifications] > driver=messagingv2 > transport_url=rabbit://guest:guest at 10.30.8.57:5672/ > topics=notification > retry=-1 > > The python consume code is as follows (followed the example provided in > the video: > transport = oslo_messaging.get_notification_transport( > cfg.CONF, url='rabbit://guest:guest at 10.30.8.57:5672/') > targets = [ > oslo_messaging.Target(topic='versioned_notifications'), > ] > > Am I missing any other configuration in any of the services in OpenStack? > > Let me know if you need any other info. > > /anil. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anilj.mailing at gmail.com Sat Jul 18 06:13:41 2020 From: anilj.mailing at gmail.com (Anil Jangam) Date: Fri, 17 Jul 2020 23:13:41 -0700 Subject: SDK API to get the version of openstack distribution Message-ID: Hi, I am able to iterate through the list of hypervisors and servers as follows. for hypervisor in self.connection.list_hypervisors(): for server in self.connection.compute.servers(): However, I could not find an API that returns the version of the OpenStack i.e. whether it is Stein, Train, or Ussuri. Openstack CLI client has command: openstack versions show Thanks, /anil. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Sat Jul 18 09:38:20 2020 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Sat, 18 Jul 2020 11:38:20 +0200 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy Message-ID: Morning, Folks! It seems the devstack glance standalone mode (the new default) is broken at the moment if not using tls-proxy. If your jobs break on g-api not coming up, then this is the likely case. So far it seems to have hit Neutron and Nodepool jobs (and hence also SDK and DIB for example). Please refrain from rechecking until solved. -yoctozepto From gmann at ghanshyammann.com Sat Jul 18 18:59:31 2020 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 18 Jul 2020 13:59:31 -0500 Subject: [qa][dev][all] Gate issues with devstack glance standalone w/o tls-proxy In-Reply-To: References: Message-ID: <173634bb16c.e1bb411a266116.18079246085083872@ghanshyammann.com> ---- On Sat, 18 Jul 2020 04:38:20 -0500 Radosław Piliszek wrote ---- > Morning, Folks! > > It seems the devstack glance standalone mode (the new default) is > broken at the moment if not using tls-proxy. If your jobs break on > g-api not coming up, then this is the likely case. > So far it seems to have hit Neutron and Nodepool jobs (and hence also > SDK and DIB for ex